from:"Ashutosh Chauhan"

[ANNOUNCE] Denys Kuzmenko joins Hive PMC

2022-02-07 Thread Ashutosh Chauhan

Hi,

I'm pleased to announce that Denys has accepted an invitation to
join the Hive PMC. Denys has been a consistent and helpful
figure in the Hive community for which we are very grateful. We
look forward to the continued contributions and support.

Please join me in congratulating Denys!

Ashutosh (On behalf of Hive PMC)

[ANNOUNCE] New committer: Ayush Saxena

2022-02-07 Thread Ashutosh Chauhan

Hi all,
Apache Hive's Project Management Committee (PMC) has invited Ayush
to become a committer, and we are pleased to announce that he has accepted!

Ayush welcome, thank you for your contributions, and we look forward to your
further interactions with the community!
Ashutosh (on behalf of Hive PMC)

Welcome Marta to Hive PMC

2021-08-02 Thread Ashutosh Chauhan

Hi all,

It's an honor to announce that Apache Hive PMC has recently voted to invite
Marta Kuczora as a new Hive PMC member. Marta is a long time Hive
contributor and committer, and has made significant contributions in Hive.
Please join me in congratulating her and looking forward to a bigger role
that she will play in the Apache Hive project.

Thanks,
Ashutosh

Re: Request for write access to hive wiki

2021-04-14 Thread Ashutosh Chauhan

Syed,
I added you to hive cwiki.

Thanks,
Ashutosh

On Thu, Oct 15, 2020 at 9:47 PM Syed Shameerur Rahman <
syedthame...@gmail.com> wrote:

> Hello,
>
> I have created a wiki account with email : *syedthame...@gmail.com
> * Please do the needful!
>
> Thank You!
>
> On Thu, Oct 15, 2020 at 10:23 PM Ashutosh Chauhan 
> wrote:
>
>> Hi Syed,
>> Did you create account on cwiki already? If you share your id, I can give
>> you access.
>>
>> Thanks,
>> Ashutosh
>>
>> On Wed, Oct 14, 2020 at 10:27 PM Syed Shameerur Rahman <
>> syedthame...@gmail.com> wrote:
>>
>> > Hello Team,
>> >
>> > Need permission to edit the following hive wikis to reflect the new
>> changes
>> > made recently.
>> >
>> >
>> >1.
>> >
>> >
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)
>> >
>> >*Related Hive Jira* - HIVE-22957
>> ><https://issues.apache.org/jira/browse/HIVE-22957>
>> >
>> >
>> >2.
>> > https://cwiki.apache.org/confluence/display/Hive/JDBC+Storage+Handler
>> >
>> >*Related Hive Jira* - HIVE-22392
>> ><https://issues.apache.org/jira/browse/HIVE-22392>
>> >
>> >
>> > It would be great if someone could help me with this!
>> >
>> > Thank You!
>> >
>>
>

[Announce] New committer : Naresh PR

2021-03-08 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited Naresh PR to
become a committer, and we are pleased to announce that he has accepted.

Naresh welcome, thank you for your contributions, and we look forward to
your further interactions with the community!

Thanks,
Ashutosh

[Announce] New committer : Panos Garefalakis

2021-03-08 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has
invited Panos Garefalakis to become a committer, and we are pleased to
announce that he has accepted.

Panos welcome, thank you for your contributions, and we look forward to
your further interactions with the community!

Thanks,
Ashutosh

[Announce] New committer : Mustafa Iman

2021-02-01 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited Mustafa Iman
to become a committer, and we are pleased to announce that he has accepted.

Mustafa welcome, thank you for your contributions, and we look forward to
your further interactions with the community!

Thanks,
Ashutosh

Welcome Adam to Hive PMC

2021-02-01 Thread Ashutosh Chauhan

Hi all,

It's an honor to announce that Apache Hive PMC has recently voted to invite
Adam Szita as a new Hive PMC member. Adam is a long time Hive contributor
and committer, and has made significant contributions in Hive. Please join
me in congratulating him and looking forward to a bigger role that he will
play in the Apache Hive project.

Thanks,
Ashutosh

[Announce] New committer : Peter Varga

2021-02-01 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited Peter Varga to
become a committer, and we are pleased to announce that he has accepted.

Peter welcome, thank you for your contributions, and we look forward to
your further interactions with the community!

Thanks,
Ashutosh

Re: Request for write access to hive wiki

2020-10-15 Thread Ashutosh Chauhan

Hi Syed,
Did you create account on cwiki already? If you share your id, I can give
you access.

Thanks,
Ashutosh

On Wed, Oct 14, 2020 at 10:27 PM Syed Shameerur Rahman <
syedthame...@gmail.com> wrote:

> Hello Team,
>
> Need permission to edit the following hive wikis to reflect the new changes
> made recently.
>
>
>1.
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)
>
>*Related Hive Jira* - HIVE-22957
>
>
>
>2.
> https://cwiki.apache.org/confluence/display/Hive/JDBC+Storage+Handler
>
>*Related Hive Jira* - HIVE-22392
>
>
>
> It would be great if someone could help me with this!
>
> Thank You!
>

Re: Apply for permission to edit Hive Wikipages

2020-08-13 Thread Ashutosh Chauhan

Liquan,
Added you to hive wiki. Welcome to the project.

Thanks,
Ashutosh

On Thu, Aug 13, 2020 at 8:21 AM Liquan Pei  wrote:

> Thank you Panos!
>
> Hi Ashutosh and Thejas, can you approve my request to edit Hive Wiki pages?
> My Apache Confluence ID is liquanpei.
>
> Best,
> Liquan
>
> On Thu, Aug 13, 2020 at 12:18 AM Panos Garefalakis 
> wrote:
>
> > Hello Liquan,
> >
> > I believe @Ashutosh Chauhan   or @Thejas Nair
> >   could help with this.
> >
> > Cheers,
> > Panagiotis
> >
> > On Wed, Aug 12, 2020 at 8:45 PM Liquan Pei  wrote:
> >
> > > My Apache Confluence ID is liquanpei.
> > >
> > > On Wed, Aug 12, 2020 at 10:43 AM Liquan Pei 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am applying for permission to edit Hive wiki pages. We
> > > > recently successfully make TiDB (https://github.com/pingcap/tidb) a
> > Hive
> > > > metastore backend. TiDB is a scalable database that can help solve
> the
> > We
> > > > would like to share the step by step instruction on how to use TiDB
> as
> > > Hive
> > > > metastore to the community.
> > > >
> > > > Best,
> > > > Liquan
> > > >
> > > > --
> > > > Liquan Pei
> > > > Senior Database Engineer, PingCAP
> > > >
> > >
> > >
> > > --
> > > Liquan Pei
> > > Software Engineer, Confluent Inc
> > >
> >
>
>
> --
> Liquan Pei
> Senior Database Engineer, PingCAP
>

Re: Time to Remove Hive-on-Spark

2020-06-03 Thread Ashutosh Chauhan

+1

On Wed, Jun 3, 2020 at 1:23 PM David Mollitor  wrote:

> Hello Gang,
>
> I have spent some time working on upgrading Avro (far less than others):
>
> https://issues.apache.org/jira/browse/HIVE-21737
>
> This should be a relatively easy thing to do, but is blocked by
> Hive-on-Spark.  HoS has a weird thing where it downloads some
> cloud-storage-hosted file of Spark-Hadoop as part of its maven run.
>
> Since HoS is not going to receive updates from the major vendors, is it
> time to simply remove it?
>
> Tests are currently disabled:
> https://issues.apache.org/jira/browse/HIVE-23137
>
> Thanks.
>

Re: Open old PRs

2020-06-01 Thread Ashutosh Chauhan

How about using stalebot : https://github.com/probot/stale ?

On Mon, Jun 1, 2020 at 12:20 PM Zoltán Haindrich  wrote:

>
>
> Hey David,
>
> On June 1, 2020 3:52:05 PM GMT+02:00, David Mollitor 
> wrote:
> >Any idea how long it will take to run precomit on all existing PRs?
> >
> I'm not entirely sure, but a rough estimate could be:
> * not every pr is mergeable; there are many which was already
> merged/outdated/etc...lets estimate that 1/3 is mergeable
> * every pr runs for at least 1 hours
>
> this would mean 430/3*1/24 days of test execution which is at least 6 days.
>
> I see little to no value in running tests on archaic prs.
> We could also configure an automatism to close prs after some time of
> inactivity
> https://github.com/actions/stale/blob/master/README.md
> The good side of this is that it will get rid of ancient prs; however it
> might seem rude to a contributor in case he is waiting for feedback or
> something
> Even with that argument I think we should configure it at least for a few
> days to get rid of the dangling prs of almost a decade ! (pr#2 is opened in
> 2011)...
> What do you think?
>
> cheers,
> Zoltan
>
>
> >On Mon, Jun 1, 2020 at 9:49 AM Panos Garefalakis 
> >wrote:
> >
> >> Same here, however, there are still ~ 430 PRs pending on master.
> >> Thanks Zoltan for this great initiative!
> >>
> >> Cheers,
> >> Panagiotis
> >>
> >> On Mon, Jun 1, 2020 at 2:33 PM David Mollitor 
> >wrote:
> >>
> >> > Thanks so much for the work on this.
> >> >
> >> > Just cleaned up mine.
> >> >
> >> > On Sat, May 30, 2020 at 10:16 AM Zoltan Haindrich 
> >wrote:
> >> >
> >> > > Hey All,
> >> > >
> >> > > The new test executor will pick up any PR which doesn't yet have
> >a test
> >> > > result - now that the patch is on the master; every PR which is
> >> mergeable
> >> > > with the master branch is
> >> > > a good candidate - so the right move would be to clean up our PR
> >> backlog.
> >> > >
> >> > > I would like to ask everyone to look at
> >> > > https://github.com/apache/hive/pulls
> >> > > and close some PRs which are already submitted or just leftovers
> >from -
> >> > > primarily I would ask you to look at PRs opened by yourself...
> >> > >
> >> > > cheers,
> >> > > Zoltan
> >> > >
> >> >
> >>
>
> --
> Zoltán Haindrich
>

Re: Ignored tests

2020-05-30 Thread Ashutosh Chauhan

Thanks Zoltan for compiling the list. I have added extra columns in your
sheet to highlight which tests we can continue to ignore in
foreseeable future. Either because tests are part of feature which is not
used heavily and is under consideration for removal in future or is already
removed. There is no point maintaining such tests.

Ashutosh

On Sat, May 30, 2020 at 7:11 AM Zoltan Haindrich  wrote:

> Hey All,
>
> I've collected the actual list of ignored tests.
> (git grep @Ignore;git grep qt:disabled) |grep /test/|sed 's/:/\t/'
>
> And put all of them into a spreadsheet:
>
> https://docs.google.com/spreadsheets/d/1jM_npmDO2OgPPGDvOHsqdyLOlU64_nmtX3_frdO3f34/edit?usp=sharing
>
> If you have few minutes please take a quick look - the sheet is free to
> edit by anyone.
> Right now the job which could be used to check that test seems to be
> stable enough it not available.
>
> cheers,
> Zoltan
>

Review Request 72544: Remove hcatalog streaming

2020-05-27 Thread Ashutosh Chauhan

/hadoop/hive/ql/txn/compactor/TestCompactor.java
 32fe535b2b 
  packaging/pom.xml 97c8cf7168 
  packaging/src/main/assembly/bin.xml 6bb4881ed2 


Diff: https://reviews.apache.org/r/72544/diff/1/


Testing
---


Thanks,

Ashutosh Chauhan

Re: Review Request 72521: HIVE-23487: Optimise PartitionManagementTask

2020-05-26 Thread Ashutosh Chauhan



> On May 26, 2020, 11:48 p.m., Ashutosh Chauhan wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
> > Lines 87-92 (original), 87-97 (patched)
> > <https://reviews.apache.org/r/72521/diff/1/?file=2232456#file2232456line87>
> >
> > I don't follow how this is an improvement. new Configuration() which I 
> > assume is expensive call is still there.
> > If anything, it appears that this change would make perf worse since 
> > earlier new Conf() was guarded by if (msc == null) so would have happened 
> > only once, but now will happen everytime.
> > Can you explain how this change is more performant?
> 
> Rajesh Balamohan wrote:
> This is because, it was creating this for every table.
>  
> With the fix in "PartitionManagementTask::run", patch constructs this 
> conf only once and reuses it across tables. (i.e in Configuration msckConf = 
> Msck.getMsckConf(conf);)
> 
> Ashutosh Chauhan wrote:
> But how? Msck.getMsckConf() constructs new Configuration() every time. 
> Also, it is invoked everytime for execute(). So, I still don't see it.

I meant run() invokes getMsckConf() for every table which in turns does new 
Configuration()


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72521/#review220874
---


On May 18, 2020, 12:53 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72521/
> -------
> 
> (Updated May 18, 2020, 12:53 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and prasanthj.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Msck.init for every table takes more CPU time than the actual table repair. 
> This was observed on a system which had lots of DB and tables.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckOperation.java 
> c05d699bd8 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestMsckCreatePartitionsInBatches.java
>  7821f40a82 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestMsckDropPartitionsInBatches.java
>  8be31128a1 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
>  f4e109d1b0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
>  e4488f4709 
> 
> 
> Diff: https://reviews.apache.org/r/72521/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>

Re: Review Request 72521: HIVE-23487: Optimise PartitionManagementTask

2020-05-26 Thread Ashutosh Chauhan



> On May 26, 2020, 11:48 p.m., Ashutosh Chauhan wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
> > Lines 87-92 (original), 87-97 (patched)
> > <https://reviews.apache.org/r/72521/diff/1/?file=2232456#file2232456line87>
> >
> > I don't follow how this is an improvement. new Configuration() which I 
> > assume is expensive call is still there.
> > If anything, it appears that this change would make perf worse since 
> > earlier new Conf() was guarded by if (msc == null) so would have happened 
> > only once, but now will happen everytime.
> > Can you explain how this change is more performant?
> 
> Rajesh Balamohan wrote:
> This is because, it was creating this for every table.
>  
> With the fix in "PartitionManagementTask::run", patch constructs this 
> conf only once and reuses it across tables. (i.e in Configuration msckConf = 
> Msck.getMsckConf(conf);)

But how? Msck.getMsckConf() constructs new Configuration() every time. Also, it 
is invoked everytime for execute(). So, I still don't see it.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72521/#review220874
---


On May 18, 2020, 12:53 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72521/
> -------
> 
> (Updated May 18, 2020, 12:53 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and prasanthj.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Msck.init for every table takes more CPU time than the actual table repair. 
> This was observed on a system which had lots of DB and tables.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckOperation.java 
> c05d699bd8 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestMsckCreatePartitionsInBatches.java
>  7821f40a82 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestMsckDropPartitionsInBatches.java
>  8be31128a1 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
>  f4e109d1b0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
>  e4488f4709 
> 
> 
> Diff: https://reviews.apache.org/r/72521/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>

Re: Review Request 72521: HIVE-23487: Optimise PartitionManagementTask

2020-05-26 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72521/#review220874
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
Lines 87-92 (original), 87-97 (patched)
<https://reviews.apache.org/r/72521/#comment309594>

I don't follow how this is an improvement. new Configuration() which I 
assume is expensive call is still there.
If anything, it appears that this change would make perf worse since 
earlier new Conf() was guarded by if (msc == null) so would have happened only 
once, but now will happen everytime.
Can you explain how this change is more performant?


- Ashutosh Chauhan


On May 18, 2020, 12:53 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72521/
> ---
> 
> (Updated May 18, 2020, 12:53 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and prasanthj.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Msck.init for every table takes more CPU time than the actual table repair. 
> This was observed on a system which had lots of DB and tables.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckOperation.java 
> c05d699bd8 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestMsckCreatePartitionsInBatches.java
>  7821f40a82 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/TestMsckDropPartitionsInBatches.java
>  8be31128a1 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
>  f4e109d1b0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PartitionManagementTask.java
>  e4488f4709 
> 
> 
> Diff: https://reviews.apache.org/r/72521/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>

Re: Review Request 72519: HIVE-23292

2020-05-16 Thread Ashutosh Chauhan

 
  ql/src/test/results/clientpositive/sort_merge_join_desc_5.q.out 1142daba9c 
  ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out 17f3b0b360 
  ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out 51bb46b399 
  ql/src/test/results/clientpositive/temp_table_partition_pruning.q.out 
f6fdd61928 
  ql/src/test/results/clientpositive/timestamp.q.out 90a46f58f4 
  ql/src/test/results/clientpositive/transform_ppr1.q.out 25468bcd9c 
  ql/src/test/results/clientpositive/transform_ppr2.q.out 8aeb688513 
  ql/src/test/results/clientpositive/truncate_column_list_bucket.q.out 
c8e40bd447 
  ql/src/test/results/clientpositive/udf_explode.q.out 0143f3160b 
  ql/src/test/results/clientpositive/udtf_explode.q.out 1b941b87bb 
  ql/src/test/results/clientpositive/union22.q.out de36e44dfb 
  ql/src/test/results/clientpositive/union24.q.out 32a86e7f02 
  ql/src/test/results/clientpositive/union_ppr.q.out b841994373 
  serde/src/java/org/apache/hadoop/hive/serde2/dynamic_type/DynamicSerDe.java 
2b832ac436 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
 18f689ebf4 


Diff: https://reviews.apache.org/r/72519/diff/2/

Changes: https://reviews.apache.org/r/72519/diff/1-2/


Testing
---


Thanks,

Ashutosh Chauhan

[jira] [Created] (HIVE-23483) Remove DynamicSerDe

2020-05-16 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-23483:
---

 Summary: Remove DynamicSerDe
 Key: HIVE-23483
 URL: https://issues.apache.org/jira/browse/HIVE-23483
 Project: Hive
  Issue Type: Task
Reporter: Ashutosh Chauhan


It is used to read thrift data files. AFAIK no one uses thrift for data 
serialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Review Request 72519: HIVE-23292

2020-05-16 Thread Ashutosh Chauhan

/pointlookup2.q.out 01fadb3c62 
  ql/src/test/results/clientpositive/llap/pointlookup3.q.out d945be2023 
  ql/src/test/results/clientpositive/llap/pointlookup4.q.out 3ca21d7460 
  ql/src/test/results/clientpositive/llap/ppd_join_filter.q.out 5145494c27 
  ql/src/test/results/clientpositive/llap/ppd_union_view.q.out d16d28b64b 
  ql/src/test/results/clientpositive/llap/ppd_vc.q.out ebb3363172 
  ql/src/test/results/clientpositive/llap/push_or.q.out 1ac850df8f 
  ql/src/test/results/clientpositive/llap/rand_partitionpruner2.q.out 
ef5509281a 
  ql/src/test/results/clientpositive/llap/router_join_ppr.q.out de20bb6209 
  ql/src/test/results/clientpositive/llap/sample1.q.out 81a821d906 
  ql/src/test/results/clientpositive/llap/sample10.q.out e1226296c9 
  ql/src/test/results/clientpositive/llap/sample5.q.out d36a43679f 
  ql/src/test/results/clientpositive/llap/sample6.q.out cb4756329d 
  ql/src/test/results/clientpositive/llap/sample7.q.out 369a4c6ef4 
  ql/src/test/results/clientpositive/llap/sample8.q.out cda918e8c4 
  ql/src/test/results/clientpositive/llap/sharedwork.q.out 175141fb9e 
  ql/src/test/results/clientpositive/llap/smb_mapjoin_15.q.out dbc180ccae 
  ql/src/test/results/clientpositive/llap/stats0.q.out 695ed643ab 
  ql/src/test/results/clientpositive/llap/stats11.q.out 71a1d9da15 
  ql/src/test/results/clientpositive/llap/stats12.q.out b82bb0bfcd 
  ql/src/test/results/clientpositive/llap/stats13.q.out 6954cbd0b1 
  
ql/src/test/results/clientpositive/llap/temp_table_alter_partition_coltype.q.out
 ead9709817 
  
ql/src/test/results/clientpositive/llap/temp_table_display_colstats_tbllvl.q.out
 29fb49bdbd 
  ql/src/test/results/clientpositive/llap/tez_fixed_bucket_pruning.q.out 
bbb7d37fee 
  ql/src/test/results/clientpositive/llap/topnkey_windowing.q.out 6bf0dd418e 
  ql/src/test/results/clientpositive/llap/vectorization_0.q.out 2c00a799d6 
  ql/src/test/results/clientpositive/regexp_extract.q.out 95f7c22bc9 
  ql/src/test/results/clientpositive/serde_user_properties.q.out ac2b2ee6c9 
  ql/src/test/results/clientpositive/sort_merge_join_desc_5.q.out 1142daba9c 
  ql/src/test/results/clientpositive/sort_merge_join_desc_6.q.out 17f3b0b360 
  ql/src/test/results/clientpositive/sort_merge_join_desc_7.q.out 51bb46b399 
  ql/src/test/results/clientpositive/temp_table_partition_pruning.q.out 
f6fdd61928 
  ql/src/test/results/clientpositive/timestamp.q.out 90a46f58f4 
  ql/src/test/results/clientpositive/transform_ppr1.q.out 25468bcd9c 
  ql/src/test/results/clientpositive/transform_ppr2.q.out 8aeb688513 
  ql/src/test/results/clientpositive/truncate_column_list_bucket.q.out 
c8e40bd447 
  ql/src/test/results/clientpositive/udf_explode.q.out 0143f3160b 
  ql/src/test/results/clientpositive/udtf_explode.q.out 1b941b87bb 
  ql/src/test/results/clientpositive/union22.q.out de36e44dfb 
  ql/src/test/results/clientpositive/union24.q.out 32a86e7f02 
  ql/src/test/results/clientpositive/union_ppr.q.out b841994373 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
 18f689ebf4 


Diff: https://reviews.apache.org/r/72519/diff/1/


Testing
---


Thanks,

Ashutosh Chauhan

Re: Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-13 Thread Ashutosh Chauhan



> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 610 (patched)
> > <https://reviews.apache.org/r/72499/diff/1/?file=2231454#file2231454line611>
> >
> > What's the reason for time based expiry? Is it because UGI expires 
> > after 24 hrs?
> > Else, I would have expected long living cache with blocking queue of 
> > bounded size.
> > 
> > Queue should be bounded by number of executors anyways, since having 
> > more connections than executors probably won't be needed.
> 
> Rajesh Balamohan wrote:
> Since this is based on the AM. So if AM dies after sometime (due to 
> inactivity, as in no-DAG submissions), these UGIs will be auto purged after 
> 10 minutes.

In LLAP, AM doesn't die due to inactivity, its long living. It may die because 
of crash though. So, then should this expiry be longer. 3 hrs?


> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 646-651 (patched)
> > <https://reviews.apache.org/r/72499/diff/1/?file=2231454#file2231454line647>
> >
> > Is this logic needed? You already have valueloader in get() which must 
> > return a ugi, so it cant be null.
> 
> Rajesh Balamohan wrote:
> Yes, value loader is for initial miss. This is to avoid single connection 
> becoming a contention for AM communication. 
> https://issues.apache.org/jira/browse/HIVE-16634

Not sure I follow. Can you add comments in code to explain the need for this?


> On May 14, 2020, 4:31 a.m., Ashutosh Chauhan wrote:
> > llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
> > Lines 663 (patched)
> > <https://reviews.apache.org/r/72499/diff/1/?file=2231454#file2231454line664>
> >
> > if its null, then its programming error. Better to not do this null 
> > check and offer without checking for null.

better to throw NPE then to leak ugi failing to return to pool.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/#review220748
---


On May 12, 2020, 12:06 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72499/
> ---
> 
> (Updated May 12, 2020, 12:06 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gopal V.
> 
> 
> Bugs: HIVE-23446
> https://issues.apache.org/jira/browse/HIVE-23446
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently UGI pool is maintained at QueryInfo level. However, when short 
> queries and lots of AMs are there, it ends missing IPC connection cache. Too 
> many connections are are also established. Patch tries to avoid that by 
> maintaining this at ContainerRunner level. It retains the current behaviour 
> of having multiple connection to same AM (otherwise can get bottlenecked on 
> single connection)
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  6a13b55e69 
>   llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
> 00fed15d2b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  eae8e08540 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  50dec4759e 
> 
> 
> Diff: https://reviews.apache.org/r/72499/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>

Re: Review Request 72499: HIVE-23446:LLAP: Reduce IPC connection misses to AM for short queries

2020-05-13 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72499/#review220748
---




llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 610 (patched)
<https://reviews.apache.org/r/72499/#comment309420>

What's the reason for time based expiry? Is it because UGI expires after 24 
hrs?
Else, I would have expected long living cache with blocking queue of 
bounded size.

Queue should be bounded by number of executors anyways, since having more 
connections than executors probably won't be needed.



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 617 (patched)
<https://reviews.apache.org/r/72499/#comment309419>

LOG.debug



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 638 (patched)
<https://reviews.apache.org/r/72499/#comment309423>

Bound this queue by number of executors?



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 640 (patched)
<https://reviews.apache.org/r/72499/#comment309421>

LOG.debug



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 646-651 (patched)
<https://reviews.apache.org/r/72499/#comment309422>

Is this logic needed? You already have valueloader in get() which must 
return a ugi, so it cant be null.



llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
Lines 663 (patched)
<https://reviews.apache.org/r/72499/#comment309424>

if its null, then its programming error. Better to not do this null check 
and offer without checking for null.


- Ashutosh Chauhan


On May 12, 2020, 12:06 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72499/
> ---
> 
> (Updated May 12, 2020, 12:06 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Gopal V.
> 
> 
> Bugs: HIVE-23446
> https://issues.apache.org/jira/browse/HIVE-23446
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently UGI pool is maintained at QueryInfo level. However, when short 
> queries and lots of AMs are there, it ends missing IPC connection cache. Too 
> many connections are are also established. Patch tries to avoid that by 
> maintaining this at ContainerRunner level. It retains the current behaviour 
> of having multiple connection to same AM (otherwise can get bottlenecked on 
> single connection)
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  6a13b55e69 
>   llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryInfo.java 
> 00fed15d2b 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  eae8e08540 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  50dec4759e 
> 
> 
> Diff: https://reviews.apache.org/r/72499/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>

Re: Review Request 72503: HIVE-23449: LLAP: Reduce mkdir and config creations in submitWork hotpath

2020-05-12 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72503/#review220734
---




llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java
Lines 197 (patched)
<https://reviews.apache.org/r/72503/#comment309406>

Aah I see, Make sense. 
Can you add comment for this during commit in ContainerRunnerImpl.java for 
usage of supplier. As you can see, it wasnt apparent for me :)


- Ashutosh Chauhan


On May 13, 2020, 2:28 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72503/
> ---
> 
> (Updated May 13, 2020, 2:28 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> For short jobs, submitWork() becomes hotpath. Patch tries to lazy load conf 
> creation and also gets rid of dir creations (which needs to be enabled only 
> when DirWatcher is enabled in ShuffleHandler)
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  6a13b55e69 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  eae8e08540 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java
>  36192520e3 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
>  aff2c2ec39 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  50dec4759e 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestContainerRunnerImpl.java
>  8ae00b9c87 
> 
> 
> Diff: https://reviews.apache.org/r/72503/diff/2/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-23449.2.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/05/13/36724ea5-265e-4e61-98a4-18af913b5b48__HIVE-23449.2.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>

Re: Review Request 72503: HIVE-23449: LLAP: Reduce mkdir and config creations in submitWork hotpath

2020-05-12 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72503/#review220732
---




llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java
Lines 197 (patched)
<https://reviews.apache.org/r/72503/#comment309404>

Still didn't follow. What is conf.get() saving here instead of passing 
constructed conf from caller?


- Ashutosh Chauhan


On May 13, 2020, 2:28 a.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72503/
> ---
> 
> (Updated May 13, 2020, 2:28 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> For short jobs, submitWork() becomes hotpath. Patch tries to lazy load conf 
> creation and also gets rid of dir creations (which needs to be enabled only 
> when DirWatcher is enabled in ShuffleHandler)
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  6a13b55e69 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  eae8e08540 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java
>  36192520e3 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
>  aff2c2ec39 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  50dec4759e 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TestContainerRunnerImpl.java
>  8ae00b9c87 
> 
> 
> Diff: https://reviews.apache.org/r/72503/diff/2/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> HIVE-23449.2.patch
>   
> https://reviews.apache.org/media/uploaded/files/2020/05/13/36724ea5-265e-4e61-98a4-18af913b5b48__HIVE-23449.2.patch
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>

Re: Review Request 72503: HIVE-23449: LLAP: Reduce mkdir and config creations in submitWork hotpath

2020-05-12 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72503/#review220729
---




llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java
Line 257 (original), 259 (patched)
<https://reviews.apache.org/r/72503/#comment309402>

I am not sure how memoization helps here. This conf.get() will always be 
called for this task execution. So, we might as well pass in conf here. Is 
there something we are saving by memoization?


- Ashutosh Chauhan


On May 12, 2020, 9:59 p.m., Rajesh Balamohan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72503/
> ---
> 
> (Updated May 12, 2020, 9:59 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> For short jobs, submitWork() becomes hotpath. Patch tries to lazy load conf 
> creation and also gets rid of dir creations (which needs to be enabled only 
> when DirWatcher is enabled in ShuffleHandler)
> 
> 
> Diffs
> -
> 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/ContainerRunnerImpl.java
>  6a13b55e69 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/QueryTracker.java
>  eae8e08540 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java
>  36192520e3 
>   
> llap-server/src/java/org/apache/hadoop/hive/llap/shufflehandler/ShuffleHandler.java
>  aff2c2ec39 
>   
> llap-server/src/test/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorTestHelpers.java
>  50dec4759e 
> 
> 
> Diff: https://reviews.apache.org/r/72503/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Rajesh Balamohan
> 
>

[jira] [Created] (HIVE-23447) Avoid sending configs to tasks and AM which are only relevant for HS2

2020-05-11 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-23447:
---

 Summary: Avoid sending configs to tasks and AM which are only 
relevant for HS2
 Key: HIVE-23447
 URL: https://issues.apache.org/jira/browse/HIVE-23447
 Project: Hive
  Issue Type: Task
  Components: Configuration
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


There are many configs which are only relevant for HS2. Longer term fix for 
this is to split HiveConf in multiple config classes relevant only for HS2, 
HMS, AM and tasks. And use only the objects in process where its relevant. In 
the interim, we can avoid configs with large value strings to send across.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23445) Remove mapreduce.workflow.* configs

2020-05-11 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-23445:
---

 Summary: Remove mapreduce.workflow.* configs
 Key: HIVE-23445
 URL: https://issues.apache.org/jira/browse/HIVE-23445
 Project: Hive
  Issue Type: Task
  Components: Configuration
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


These configs were introduced in HIVE-3708 in the hope to develop tools to 
visualize and monitor multiple MR jobs from Hive back in a day when MR was 
used. Even that time in spite of these config additions, no such tools were 
developed AFAIK. And now MR is hardly ever used. We can get rid of these 
configs. That will help to reduce the size of HiveConf by a bit.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Should we release Hive Storage API 2.7.2-rc1?

2020-05-09 Thread Ashutosh Chauhan

+1
Thanks Gopal for the script.

Ashutosh



On Fri, May 8, 2020 at 5:40 PM Gopal V  wrote:

> Hi,
>
> Validated checksums, signatures, built and verified against latest orc.
>
> Since I was lazy enough to automate this, here's a script for others who
> might not have voted (or want to add things to this).
>
> https://github.com/t3rmin4t0r/verify-asf-releases + make -f
> Makefile.storage-api
>
> should do all the checks I did, unattended.
>
> Cheers,
> Gopal
>
> On 5/5/20 7:49 AM, Jesus Camacho Rodriguez wrote:
> > All,
> >
> > I'd like to make a storage-api release with HIVE-22959
> >  and HIVE-23215 <
> > https://issues.apache.org/jira/browse/HIVE-23215> in it.
> >
> > Should we release the following artifacts as Hive Storage API 2.7.2?
> >
> > tar: http://home.apache.org/~jcamacho/hive-storage-2.7.2/
> > tag:
> https://github.com/apache/hive/releases/tag/storage-release-2.7.2-rc1
> > jiras: https://issues.apache.org/jira/projects/HIVE/versions/12347828
> >
> > Thanks!
> >
> > -Jesús
> >
>

Re: Review Request 72437: Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal

2020-05-05 Thread Ashutosh Chauhan



> On May 5, 2020, 5:04 a.m., Ashutosh Chauhan wrote:
> > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
> > Line 832 (original), 830 (patched)
> > <https://reviews.apache.org/r/72437/diff/3/?file=2230109#file2230109line839>
> >
> > isView only used for this check here, which can be eliminated.
> 
> Attila Magyar wrote:
> Not sure what you mean by eliminating it? Removing it altogeather?

yes remove this if(), its only purpose is to throw exception. We can remove 
this to save RDBMS query for isViewTable()


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72437/#review220616
---


On April 27, 2020, 9:15 a.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72437/
> ---
> 
> (Updated April 27, 2020, 9:15 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Rajesh Balamohan, and Vineet Garg.
> 
> 
> Bugs: HIVE-23282
> https://issues.apache.org/jira/browse/HIVE-23282
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> ObjectStore::getPartitionsByExprInternal internally uses Table information 
> for getting partitionKeys, table, catalog name.
> 
>  
> 
> For this, it ends up populating entire table data from DB (including skew 
> column, parameters, sort, bucket cols etc). This makes it a lot more 
> expensive call. It would be good to check if MTable itself can be used 
> instead of Table.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  4f58cd91efc 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
>  d1558876f14 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  53b7a67a429 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
>  9834883f00f 
> 
> 
> Diff: https://reviews.apache.org/r/72437/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>

Re: Review Request 72437: Reduce number of DB calls in ObjectStore::getPartitionsByExprInternal

2020-05-04 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72437/#review220616
---




ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
Line 1548 (original), 1548 (patched)
<https://reviews.apache.org/r/72437/#comment309172>

assert table != null ?
At this point there is no advantage of passing null for partition keys, 
since if its null then retrieving partitions of this table won't make sense.



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
Line 832 (original), 830 (patched)
<https://reviews.apache.org/r/72437/#comment309177>

isView only used for this check here, which can be eliminated.



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
Lines 1017 (patched)
<https://reviews.apache.org/r/72437/#comment309173>

assert partitonKeys != null
If they are null, retrieving partitions won't make sense.



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 3473-3474 (patched)
<https://reviews.apache.org/r/72437/#comment309174>

This needs to be avoided, else it will generate additional RDBMS queries. 
It's only needed for getPartitionsViaSqlFilter() and not on other paths. And 
even there its needed only to make checks which can be avoided altogether. so, 
there is no need for isViewTable anywhere.



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 3832 (patched)
<https://reviews.apache.org/r/72437/#comment309175>

may remove this.



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 4077-4078 (patched)
<https://reviews.apache.org/r/72437/#comment309176>

This needs to be avoided, else it will generate additional RDBMS queries. 
It's only needed for one path and even there its only needed to make checks. 
isViewTable can be eliminated entirely.



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 4287 (patched)
<https://reviews.apache.org/r/72437/#comment309178>

LOG.debug



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 4114 (original), 4307 (patched)
<https://reviews.apache.org/r/72437/#comment309179>

LOG.debug


- Ashutosh Chauhan


On April 27, 2020, 9:15 a.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72437/
> ---
> 
> (Updated April 27, 2020, 9:15 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Rajesh Balamohan, and Vineet Garg.
> 
> 
> Bugs: HIVE-23282
> https://issues.apache.org/jira/browse/HIVE-23282
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> ObjectStore::getPartitionsByExprInternal internally uses Table information 
> for getting partitionKeys, table, catalog name.
> 
>  
> 
> For this, it ends up populating entire table data from DB (including skew 
> column, parameters, sort, bucket cols etc). This makes it a lot more 
> expensive call. It would be good to check if MTable itself can be used 
> instead of Table.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  4f58cd91efc 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
>  d1558876f14 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  53b7a67a429 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
>  9834883f00f 
> 
> 
> Diff: https://reviews.apache.org/r/72437/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>

Review Request 72463: HIVE-23298

2020-05-03 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72463/
---

Review request for hive and Jesús Camacho Rodríguez.


Bugs: HIVE-23298
https://issues.apache.org/jira/browse/HIVE-23298


Repository: hive-git


Description
---

rs_dedup


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java da277d058f 
  ql/src/test/results/clientpositive/llap/auto_join18.q.out 505ce8ca5b 
  ql/src/test/results/clientpositive/llap/auto_join18_multi_distinct.q.out 
c77bed7072 
  ql/src/test/results/clientpositive/llap/bucket2.q.out 9b82a96fc1 
  ql/src/test/results/clientpositive/llap/bucket4.q.out ea9dc76a3a 
  ql/src/test/results/clientpositive/llap/bucket_num_reducers2.q.out 17f30f9e58 
  ql/src/test/results/clientpositive/llap/check_constraint.q.out e4fe16427f 
  ql/src/test/results/clientpositive/llap/disable_merge_for_bucketing.q.out 
389a5f2769 
  ql/src/test/results/clientpositive/llap/distinct_groupby.q.out b396e454c7 
  ql/src/test/results/clientpositive/llap/distinct_stats.q.out f0daa4c4f1 
  ql/src/test/results/clientpositive/llap/dynpart_sort_optimization2.q.out 
bb3b6c39f0 
  ql/src/test/results/clientpositive/llap/enforce_constraint_notnull.q.out 
6abd6f3c82 
  ql/src/test/results/clientpositive/llap/except_all.q.out 4c2498f5a8 
  ql/src/test/results/clientpositive/llap/except_distinct.q.out 47f45c5cdd 
  ql/src/test/results/clientpositive/llap/explainuser_1.q.out 7f0ce5a9c7 
  ql/src/test/results/clientpositive/llap/explainuser_2.q.out 6f275c6ecf 
  ql/src/test/results/clientpositive/llap/groupby3_map.q.out 0bef509f54 
  ql/src/test/results/clientpositive/llap/groupby3_map_multi_distinct.q.out 
2290c22814 
  ql/src/test/results/clientpositive/llap/groupby3_map_skew.q.out 258e54591e 
  ql/src/test/results/clientpositive/llap/groupby4_map.q.out 7d4d7a0524 
  ql/src/test/results/clientpositive/llap/groupby4_map_skew.q.out eb53e25441 
  ql/src/test/results/clientpositive/llap/groupby5_map.q.out ddd0557df2 
  ql/src/test/results/clientpositive/llap/groupby5_map_skew.q.out b6d681b3f1 
  ql/src/test/results/clientpositive/llap/insert_into_default_keyword.q.out 
b7355fb2d2 
  ql/src/test/results/clientpositive/llap/intersect_all.q.out 549cca487a 
  ql/src/test/results/clientpositive/llap/intersect_distinct.q.out 950bc4b68c 
  ql/src/test/results/clientpositive/llap/limit_pushdown.q.out 63e524d5d4 
  ql/src/test/results/clientpositive/llap/mrr.q.out 628f91af1e 
  ql/src/test/results/clientpositive/llap/offset_limit_ppd_optimizer.q.out 
208646bba1 
  ql/src/test/results/clientpositive/llap/ptf.q.out c678e64902 
  ql/src/test/results/clientpositive/llap/reduce_deduplicate.q.out 9df57473f4 
  ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out 
d15ea89888 
  ql/src/test/results/clientpositive/llap/reducesink_dedup.q.out 84c8223214 
  ql/src/test/results/clientpositive/llap/subquery_ANY.q.out 8fa69c5aaf 
  ql/src/test/results/clientpositive/llap/subquery_in.q.out 60522c838b 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out 3bb3a042a0 
  ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 8fab16789b 
  ql/src/test/results/clientpositive/llap/subquery_select.q.out 311cee743d 
  ql/src/test/results/clientpositive/llap/tez_dml.q.out d716b63012 
  ql/src/test/results/clientpositive/llap/tez_union2.q.out 762a2a51d0 
  ql/src/test/results/clientpositive/llap/tez_union_multiinsert.q.out 
d5bc1790d4 
  ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out aa92f46f17 
  ql/src/test/results/clientpositive/llap/unionDistinct_3.q.out 69c43706b2 
  ql/src/test/results/clientpositive/llap/vector_decimal_6.q.out e899da5c1f 
  ql/src/test/results/clientpositive/llap/vector_groupby_reduce.q.out 
e74bc44680 
  ql/src/test/results/clientpositive/llap/vector_outer_reference_windowed.q.out 
cb086bd5a3 
  ql/src/test/results/clientpositive/llap/vector_ptf_1.q.out d4d22d05d8 
  ql/src/test/results/clientpositive/llap/vector_windowing.q.out ca3c6337bf 
  ql/src/test/results/clientpositive/llap/vectorization_limit.q.out 36276e1fc9 
  ql/src/test/results/clientpositive/llap/vectorized_ptf.q.out 640e8f0dc3 
  ql/src/test/results/clientpositive/perf/tez/constraints/query51.q.out 
257cb58c50 
  ql/src/test/results/clientpositive/perf/tez/constraints/query53.q.out 
06726ba4a1 
  ql/src/test/results/clientpositive/perf/tez/constraints/query63.q.out 
cdcd316388 
  ql/src/test/results/clientpositive/perf/tez/query51.q.out 8e3c53dfc9 
  ql/src/test/results/clientpositive/perf/tez/query53.q.out 2d2c0c374e 
  ql/src/test/results/clientpositive/perf/tez/query63.q.out 6d7a54a808 


Diff: https://reviews.apache.org/r/72463/diff/1/


Testing
---


Thanks,

Ashutosh Chauhan

[jira] [Created] (HIVE-23287) Reduce dependency on icu4j

2020-04-23 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-23287:
---

 Summary: Reduce dependency on icu4j
 Key: HIVE-23287
 URL: https://issues.apache.org/jira/browse/HIVE-23287
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Brought in transitively via druid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23278) Remove dependency on bouncycastle

2020-04-23 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-23278:
---

 Summary: Remove dependency on bouncycastle
 Key: HIVE-23278
 URL: https://issues.apache.org/jira/browse/HIVE-23278
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23268) Eliminate beanutils transitive dependency

2020-04-22 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-23268:
---

 Summary: Eliminate beanutils transitive dependency
 Key: HIVE-23268
 URL: https://issues.apache.org/jira/browse/HIVE-23268
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Transitively retrieved from hadoop-commons



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23267) Reduce dependency on groovy

2020-04-21 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-23267:
---

 Summary: Reduce dependency on groovy
 Key: HIVE-23267
 URL: https://issues.apache.org/jira/browse/HIVE-23267
 Project: Hive
  Issue Type: Improvement
  Components: Query Planning
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Transitively pulled where its unneeded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23262) Remove dependency on activemq

2020-04-20 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-23262:
---

 Summary: Remove dependency on activemq
 Key: HIVE-23262
 URL: https://issues.apache.org/jira/browse/HIVE-23262
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Activemq is a test dependency introduced to test message bus feature in 
hcatalog. Even when it was introduced very first, there were concerns of taking 
activemq as a dependency. 
https://issues.apache.org/jira/browse/HCATALOG-3?focusedCommentId=13035113=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13035113
AFAIK no one uses message bus feature of HCatalog. We can remove it altogether. 
As a first step removing tests for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-23241) Reduce transitive dependencies

2020-04-18 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-23241:
---

 Summary: Reduce transitive dependencies
 Key: HIVE-23241
 URL: https://issues.apache.org/jira/browse/HIVE-23241
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Should we release Hive Storage API 2.7.2-rc0?

2020-04-06 Thread Ashutosh Chauhan

+1
built and ran few tests locally.

On Mon, Mar 30, 2020 at 2:21 PM Owen O'Malley 
wrote:

> In evaluating this RC, I discovered HIVE-22959, which is the only patch in
> this RC.
>
> I'm uncomfortable with the API added by HIVE-22959, because it is
> duplicating a lot of the functionality from VectorizedRowBatch. I'll look
> at the motivating ORC-577 tomorrow, but for now I'm -1 on releasing it.
>
> .. Owen
>
> On Mon, Mar 30, 2020 at 1:18 PM Vineet G  wrote:
>
> > +1. Verified the signature, checksum and build.
> >
> > Vineet
> >
> > > On Mar 30, 2020, at 1:20 AM, Zoltan Haindrich  wrote:
> > >
> > > +1
> > >
> > > * verified checksum/etc
> > > * built and run tests locally
> > > * built orc/master against it
> > > * there doesn't seem to be a staged nexus repo for this - but it seems
> > like earlier releases also doesn't had that; meanwhile
> >
> https://repo.maven.apache.org/maven2/org/apache/hive/hive-storage-api/2.7.1/
> > seems to have them ; I assume it will be also uploaded there along with
> > sources/etc
> > >
> > >
> > > On 3/24/20 9:33 PM, Jesus Camacho Rodriguez wrote:
> > >> All,
> > >> I'd like to make a storage-api release with HIVE-22959
> > >>  in it.
> > >> Should we release the following artifacts as Hive Storage API 2.7.2?
> > >> tar: http://home.apache.org/~jcamacho/hive-storage-2.7.2/
> > >> tag:
> > https://github.com/apache/hive/releases/tag/storage-release-2.7.2-rc0
> > >> jiras: https://issues.apache.org/jira/projects/HIVE/versions/12347828
> > >> Thanks!
> > >> -Jesús
> >
> >
>

Re: Review Request 72200: TopN Key efficiency check might disable filter too soon

2020-03-05 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72200/#review219806
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Line 2421 (original), 2421 (patched)
<https://reviews.apache.org/r/72200/#comment308037>

i think we shall check for atleast 100K rows before turning this off, so 
checking for 10K batches make more sense to me. So, lets have default as 10K 
here.


- Ashutosh Chauhan


On March 5, 2020, 2:22 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72200/
> ---
> 
> (Updated March 5, 2020, 2:22 p.m.)
> 
> 
> Review request for hive, Gopal V, Jesús Camacho Rodríguez, Krisztian Kasa, 
> and Rajesh Balamohan.
> 
> 
> Bugs: HIVE-22982
> https://issues.apache.org/jira/browse/HIVE-22982
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The check is triggered after every n batches but there can be multiple 
> filters, one for each partition. Some filters might have less data then the 
> others.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 7ea2de9019c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/TopNKeyOperator.java f09867bb4e8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorTopNKeyOperator.java 
> 0f8eb173c66 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestTopNKeyFilter.java 
> a91bc7354a7 
> 
> 
> Diff: https://reviews.apache.org/r/72200/diff/1/
> 
> 
> Testing
> ---
> 
> manually
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>

[Announce] New committer : Denys Kuzmenko

2020-02-14 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited Denys Kuzmenko
to become a committer, and we are pleased to announce that he has accepted.

Denys welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Thanks,
Ashutosh

[Announce] New committer : Laszlo Pinter

2020-02-10 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited Laszlo Pinter
to become a committer, and we are pleased to announce that he has accepted.

Laszlo welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Thanks,
Ashutosh

Welcome Anishek To Apache Hive PMC

2020-02-10 Thread Ashutosh Chauhan

I'm happy to announce Anishek Agarwal as the latest addition to the Apache
Hive Project Management Committee (PMC).

He has been an important committer to the project and active member of the
community helping advance Apache Hive.

Congratulations, and thank you for your hard work

Thanks,
Ashutosh

Welcome Mahesh to Hive PMC

2020-02-10 Thread Ashutosh Chauhan

Hi all,

It's an honor to announce that Apache Hive PMC has recently voted to invite
Mahesh Kumar Behera as a new Hive PMC member. Mahesh is a long time Hive
contributor and committer, and has made significant contribution in Hive.
Please join me in congratulating him and looking forward to a bigger role
that he will play in Apache Hive project.

Thanks,
Ashutosh

[jira] [Created] (HIVE-22745) Config option to turn off read locks

2020-01-17 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-22745:
---

 Summary: Config option to turn off read locks
 Key: HIVE-22745
 URL: https://issues.apache.org/jira/browse/HIVE-22745
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Reporter: Ashutosh Chauhan


Although its not recommended but in perf critical scenario this option may be 
exercised. We have observed lock acquisition to take long time in heavily 
loaded system. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22697) Remove expensive RDBMS queries in lock acquisition logic

2020-01-06 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-22697:
---

 Summary: Remove expensive RDBMS queries in lock acquisition logic
 Key: HIVE-22697
 URL: https://issues.apache.org/jira/browse/HIVE-22697
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Reporter: Gopal Vijayaraghavan


There are queries like
"update NEXT_LOCK_ID set nl_next = 163084841;" which can be entirely skipped
the ACID impl doesn't require globally unique lock ids
because locks are marked as txn.id
In a particular instance we observed this logic taking multiple seconds in mysql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Review Request 71707: Performance degradation on single row inserts

2019-11-05 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71707/#review218518
---




standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
Line 323 (original), 321 (patched)
<https://reviews.apache.org/r/71707/#comment306261>

can you please also make similiar change to 
common/src/java/org/apache/hadoop/hive/common/FileUtils.java::listStatusRecursively()
 so that method also benefits from this change.



standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
Line 331 (original), 324 (patched)
<https://reviews.apache.org/r/71707/#comment306259>

you may use BlobStorageUtils::isBlobStorageFileSystem() here.



standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
Lines 378 (patched)
<https://reviews.apache.org/r/71707/#comment306260>

BlobStorageUtils::isBlobStorageFileSystem() instead


- Ashutosh Chauhan


On Nov. 5, 2019, 3:32 p.m., Attila Magyar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71707/
> ---
> 
> (Updated Nov. 5, 2019, 3:32 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Peter Vary, and Slim Bouguerra.
> 
> 
> Bugs: HIVE-22411
> https://issues.apache.org/jira/browse/HIVE-22411
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Executing single insert statements on a transactional table effects write 
> performance on a s3 file system. Each insert creates a new delta directory. 
> After each insert hive calculates statistics like number of file in the table 
> and total size of the table. In order to calculate these, it traverses the 
> directory recursively. During the recursion for each path a separate 
> listStatus call is executed. In the end the more delta directory you have the 
> more time it takes to calculate the statistics.
> 
> Therefore insertion time goes up linearly.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java
>  38e843aeacf 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/utils/FileUtils.java
>  bf206fffc26 
> 
> 
> Diff: https://reviews.apache.org/r/71707/diff/2/
> 
> 
> Testing
> ---
> 
> measured and plotted insertation time
> 
> 
> Thanks,
> 
> Attila Magyar
> 
>

[Announce] New committer : David Mollitor

2019-09-11 Thread Ashutosh Chauhan

Hi,

Apache Hive's Project Management Committee (PMC) has invited David Mollitor
to become a committer, and we are pleased to announce that he has accepted.

David welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)

[jira] [Created] (HIVE-22151) Turn off hybrid grace hash join by default

2019-08-27 Thread Ashutosh Chauhan (Jira)

Ashutosh Chauhan created HIVE-22151:
---

 Summary: Turn off hybrid grace hash join by default
 Key: HIVE-22151
 URL: https://issues.apache.org/jira/browse/HIVE-22151
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (HIVE-22112) update jackson version in disconnected poms

2019-08-14 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-22112:
---

 Summary: update jackson version in disconnected poms 
 Key: HIVE-22112
 URL: https://issues.apache.org/jira/browse/HIVE-22112
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


was updated in top level pom via HIVE-22089



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22090) Upgrade jetty to 9.3.27

2019-08-07 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-22090:
---

 Summary: Upgrade jetty to 9.3.27
 Key: HIVE-22090
 URL: https://issues.apache.org/jira/browse/HIVE-22090
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HIVE-22089) Upgrade jackson to 2.9.9

2019-08-07 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-22089:
---

 Summary: Upgrade jackson to 2.9.9
 Key: HIVE-22089
 URL: https://issues.apache.org/jira/browse/HIVE-22089
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.1.1, 3.1.0, 3.0.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[ANNOUNCE] New committer: Rajkumar Singh

2019-07-25 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited Rajkumar Singh
to become a committer, and we are pleased to announce that he has accepted.

Raj welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)

[ANNOUNCE] New committer: Miklos Gergely

2019-07-15 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited Miklos Gergely
to become a committer, and we are pleased to announce that he has accepted.

Miklos welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)

[jira] [Created] (HIVE-21815) Stats in ORC file are parsed twice

2019-05-31 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-21815:
---

 Summary: Stats in ORC file are parsed twice
 Key: HIVE-21815
 URL: https://issues.apache.org/jira/browse/HIVE-21815
 Project: Hive
  Issue Type: Improvement
  Components: ORC
Reporter: Gopal V


ORC record reader unnecessarily parses stats twice



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[ANNOUNCE] New committer: Laszlo Bodor

2019-04-14 Thread Ashutosh Chauhan

 Apache Hive's Project Management Committee (PMC) has invited Laszlo
Bodor to become a committer, and we are pleased to announce that he has
accepted.

Laszlo welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)

[jira] [Created] (HIVE-21427) Syslog serde

2019-03-11 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-21427:
---

 Summary: Syslog serde
 Key: HIVE-21427
 URL: https://issues.apache.org/jira/browse/HIVE-21427
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Ashutosh Chauhan


It will be useful to read syslog generated log files in Hive



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Avoiding merge commits

2019-03-11 Thread Ashutosh Chauhan

Hi,
With advent of gitbox/github integration, folks have started using merge
commits (inadvertently I believe by merging via Github). This causes issues
downstream where in my branch, I can't cherry-pick and rebase, but rather
get merge history. So, I propose we ban merge commits.
Too radical?

Thanks,
Ashutosh

[jira] [Created] (HIVE-21341) Sensible defaults : hive.server2.idle.operation.timeout and hive.server2.idle.session.timeout are too high

2019-02-27 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-21341:
---

 Summary: Sensible defaults : hive.server2.idle.operation.timeout 
and hive.server2.idle.session.timeout are too high
 Key: HIVE-21341
 URL: https://issues.apache.org/jira/browse/HIVE-21341
 Project: Hive
  Issue Type: Improvement
  Components: Configuration, HiveServer2
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Defaults are too high, which results in extra resources being held too long in 
HS2 memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 69918: HIVE-21001 Update to Calcite 1.18

2019-02-12 Thread Ashutosh Chauhan



> On Feb. 7, 2019, 9:16 p.m., Ashutosh Chauhan wrote:
> > accumulo-handler/src/test/results/positive/accumulo_predicate_pushdown.q.out
> > Line 417 (original), 417 (patched)
> > <https://reviews.apache.org/r/69918/diff/1/?file=2124217#file2124217line417>
> >
> > This should further fold to key >= '90'Can you file a follow-up jira 
> > for this?
> 
> Zoltan Haindrich wrote:
> I think this might not be simplified further; because when key is null 
> the expression should be true.
> ```
> (key < '90') is not true
> (key >= '90') or key is null
> 
> not COALESCE((key < '90'),false)
> COALESCE((key >= '90'),true)
> ```

Thats right. But it can be folded to key < '90' is false


> On Feb. 7, 2019, 9:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/constant_prop_3.q.out
> > Line 286 (original), 286 (patched)
> > <https://reviews.apache.org/r/69918/diff/1/?file=2124230#file2124230line286>
> >
> > New expression tree is longer compared to original. I guess we try to 
> > apply DeMorgan theorem here, but in this case its a net loss. Perhaps, we 
> > can add a (simple) logic which says if node count in expression tree grows 
> > after the application of theorem we throw away that.
> 
> Zoltan Haindrich wrote:
> simplification is too conservative in 1.18; see: CALCITE-2840

We shall make CALCITE-2840 blocker for 1.19 release since its a regression.


> On Feb. 7, 2019, 9:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/fold_case.q.out
> > Line 21 (original), 21 (patched)
> > <https://reviews.apache.org/r/69918/diff/1/?file=2124238#file2124238line21>
> >
> > {{is true}} is redundant. Can we re-fold it?
> 
> Zoltan Haindrich wrote:
> yes; simplify currently not removes redundant IS X w.r.t. unknownAs mode 
> - I've noted this somewhere...
> 
> it's interesting that it earlier worked
> CALCITE-2838

We shall make CALCITE-2838 blocker for 1.19 release since its a regression.


> On Feb. 7, 2019, 9:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/fold_eq_with_case_when.q.out
> > Line 50 (original), 50 (patched)
> > <https://reviews.apache.org/r/69918/diff/1/?file=2124239#file2124239line50>
> >
> > New Expression is more complex to evaluate. Can we refold this?
> 
> Zoltan Haindrich wrote:
> yes, this shouldn't happen; "ELSE NULL" should really be "ELSE FALSE"

is this tracked in a jira?


> On Feb. 7, 2019, 9:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/fold_when.q.out
> > Line 227 (original), 227 (patched)
> > <https://reviews.apache.org/r/69918/diff/1/?file=2124240#file2124240line227>
> >
> > This {{is true}} also gets in a way of constant propagation.
> 
> Zoltan Haindrich wrote:
> CALCITE-2838

We shall make CALCITE-2838 blocker for 1.19 release since its a regression.


> On Feb. 7, 2019, 9:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/perf/tez/query70.q.out
> > Line 113 (original), 113 (patched)
> > <https://reviews.apache.org/r/69918/diff/1/?file=2124400#file2124400line113>
> >
> > UDFToLong(0) should be folded. Can you file a follow-up jira for it?
> 
> Zoltan Haindrich wrote:
> yes; cast(null as string) also seems to be odd
> at the ast level it looks good - calcite doesn't seem to be leaving an 
> explicit cast there

is this tracked in a jira?


> On Feb. 7, 2019, 9:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/ppd_udf_case.q.out
> > Line 50 (original), 49 (patched)
> > <https://reviews.apache.org/r/69918/diff/1/?file=2124414#file2124414line50>
> >
> > New expression is more expensive to evaluate.
> 
> Zoltan Haindrich wrote:
> this expression is false
> ```
> (null and (key = '27') is not true and (key = '38') is not true)) is true
> ```
> I'll take a look why it regressed
> 
> after some calcite fixes I've prepared so far...most probably CALCITE-2840

We shall make CALCITE-2840 blocker for 1.19 release since its a regression.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69918/#review212637
---


On Feb. 7, 2019, 7:08 p.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org

Re: Review Request 69011: HIVE-20713: Use percentage for join conversion size thresholds

2019-02-08 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69011/#review212674
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2017 (patched)
<https://reviews.apache.org/r/69011/#comment298514>

Default value set by Ambari is 0.24. Shall we use 0.25 here instead of 0.3.

https://github.com/hortonworks/hdp_ambari_definitions/blob/AMBARI-2.7-maint/src/main/resources/stacks/HDP/3.0/services/YARN/service_advisor.py#L1416



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2022 (patched)
<https://reviews.apache.org/r/69011/#comment298515>

Currently factor = large table / executor memory. I think reducer count 
should also part of the equation since its memory / reducer which matters here. 
i.e. large_table / (executor_memory * reducer_count) In perfect world where 
all our estimates meets reality this should be 1. That is shuffle data is 
perfectly divided for each executor. This is also more intuitive to think about 
for end user. If they want to be aggressive set this to more than 1, and to be 
conservative < 1. We can pick default of 1, which is also more intutitive than 
current 10. Although, this assumes reducer count info is available.



common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2038 (patched)
<https://reviews.apache.org/r/69011/#comment298516>

Shall we just delete these configs. Leaving them in code doesnt serve any 
purpose other than to confuse code reader :)



ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java
Lines 60-62 (patched)
<https://reviews.apache.org/r/69011/#comment298517>

Didnt follow this logic. If tezcontainersize is < 0 then we will pick 
default_map_memory and if that is < 0 then we again pick that. So, you will 
still end up with -ve value.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java
Line 103 (original), 104 (patched)
<https://reviews.apache.org/r/69011/#comment298518>

LOG.debug



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java
Lines 370 (patched)
<https://reviews.apache.org/r/69011/#comment298519>

LOG.debug



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java
Line 1577 (original), 1580 (patched)
<https://reviews.apache.org/r/69011/#comment298520>

LOG.debug



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
Lines 435 (patched)
<https://reviews.apache.org/r/69011/#comment298521>

LOG.debug



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/MemoryDecider.java
Lines 79 (patched)
<https://reviews.apache.org/r/69011/#comment298522>

LOG.debug



ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
Line 194 (original), 195 (patched)
<https://reviews.apache.org/r/69011/#comment298523>

LOG.debug



ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
Line 405 (original), 406 (patched)
<https://reviews.apache.org/r/69011/#comment298524>

LOG.debug



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 600 (patched)
<https://reviews.apache.org/r/69011/#comment298525>

LOG.debug


- Ashutosh Chauhan


On Oct. 12, 2018, 11:41 p.m., Prasanth_J wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69011/
> -------
> 
> (Updated Oct. 12, 2018, 11:41 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20713
> https://issues.apache.org/jira/browse/HIVE-20713
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20713: Use percentage for join conversion size thresholds
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> cc6239c3135714fb65aa1afc2882852460b68b37 
>   data/conf/hive-site.xml 0daf9adc717bc1c4413d2e34691c26a3e2585c77 
>   data/conf/llap/hive-site.xml 44ca6c9daf092a35f1c58c26dfa3575c303464ce 
>   data/conf/tez/hive-site.xml 236adc7087b43f4e9ab95b2fa57436cf75c679aa 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 40dd992455f2fa6bae85d9d02338bc820a370ebe 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MemoryInfo.java PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java
>  a6b0dbc0dc956d81d027f08a55fbdf0ca452638f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
>  54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoad

Re: Review Request 69918: HIVE-21001 Update to Calcite 1.18

2019-02-07 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69918/#review212637
---




accumulo-handler/src/test/results/positive/accumulo_predicate_pushdown.q.out
Line 417 (original), 417 (patched)
<https://reviews.apache.org/r/69918/#comment298468>

This should further fold to key >= '90'Can you file a follow-up jira for 
this?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelBuilder.java
Lines 156 (patched)
<https://reviews.apache.org/r/69918/#comment298469>

Can you add a comment for this, since this is counter-intuitive.



ql/src/test/results/clientpositive/cbo_rp_simple_select.q.out
Line 902 (original), 900 (patched)
<https://reviews.apache.org/r/69918/#comment298470>

We lost filterExpr on TableScan this will prohibit pushing of filters to 
ORC. Can you investigate this and file follow-up jira for this?



ql/src/test/results/clientpositive/constant_prop_3.q.out
Line 286 (original), 286 (patched)
<https://reviews.apache.org/r/69918/#comment298471>

New expression tree is longer compared to original. I guess we try to apply 
DeMorgan theorem here, but in this case its a net loss. Perhaps, we can add a 
(simple) logic which says if node count in expression tree grows after the 
application of theorem we throw away that.



ql/src/test/results/clientpositive/constprog_when_case.q.out
Line 53 (original), 53 (patched)
<https://reviews.apache.org/r/69918/#comment298472>

Same comment as previous.



ql/src/test/results/clientpositive/fold_case.q.out
Line 21 (original), 21 (patched)
<https://reviews.apache.org/r/69918/#comment298473>

{{is true}} is redundant. Can we re-fold it?



ql/src/test/results/clientpositive/fold_eq_with_case_when.q.out
Line 50 (original), 50 (patched)
<https://reviews.apache.org/r/69918/#comment298474>

New Expression is more complex to evaluate. Can we refold this?



ql/src/test/results/clientpositive/fold_when.q.out
Line 227 (original), 227 (patched)
<https://reviews.apache.org/r/69918/#comment298475>

This {{is true}} also gets in a way of constant propagation.



ql/src/test/results/clientpositive/list_bucket_dml_14.q.out
Line 300 (original), 300 (patched)
<https://reviews.apache.org/r/69918/#comment298476>

Although this doesn't affect correctness, but Hive does make a difference 
between string and varchar. It would have been useful to retain this cast as 
string since that is what is executed by Hive.



ql/src/test/results/clientpositive/llap/subquery_multi.q.out
Lines 1751-1755 (original), 1751-1757 (patched)
<https://reviews.apache.org/r/69918/#comment298480>

This looks like a worse plan with an extra join. can you investigate this?



ql/src/test/results/clientpositive/llap/subquery_multi.q.out
Lines 2312-2313 (patched)
<https://reviews.apache.org/r/69918/#comment298481>

Worse plan than earlier.



ql/src/test/results/clientpositive/perf/tez/cbo_query13.q.out
Lines 117-121 (original), 117-123 (patched)
<https://reviews.apache.org/r/69918/#comment298484>

Looks like join order has changed. Is new order better?



ql/src/test/results/clientpositive/perf/tez/constraints/cbo_ext_query1.q.out
Lines 62-65 (original), 62-65 (patched)
<https://reviews.apache.org/r/69918/#comment298486>

Join order changed. Is new order better?



ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query6.q.out
Lines 68-70 (original), 68-70 (patched)
<https://reviews.apache.org/r/69918/#comment298487>

Is new join order better?



ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query64.q.out
Line 301 (original), 301 (patched)
<https://reviews.apache.org/r/69918/#comment298488>

Cast should get folded?



ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query78.q.out
Lines 150-154 (original), 147-160 (patched)
<https://reviews.apache.org/r/69918/#comment298489>

Is new join order better?



ql/src/test/results/clientpositive/perf/tez/query70.q.out
Line 113 (original), 113 (patched)
<https://reviews.apache.org/r/69918/#comment298485>

UDFToLong(0) should be folded. Can you file a follow-up jira for it?



ql/src/test/results/clientpositive/ppd_udf_case.q.out
Line 50 (original), 49 (patched)
<https://reviews.apache.org/r/69918/#comment298478>

New expression is more expensive to evaluate.



ql/src/test/results/clientpositive/ppd_udf_case.q.out
Line 214 (original), 215-217 (patched)
<https://reviews.apache.org/r/69918/#comment298479>

Constant propagation is also broken here.


- Ashutosh Chauhan


On Feb. 7, 2019, 7:08 p.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache

[jira] [Created] (HIVE-21189) hive.merge.nway.joins should default to false

2019-01-30 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-21189:
---

 Summary: hive.merge.nway.joins should default to false
 Key: HIVE-21189
 URL: https://issues.apache.org/jira/browse/HIVE-21189
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 69562: HIVE-16957

2018-12-21 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69562/#review211502
---




ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsAutoGatherContext.java
Lines 110 (patched)
<https://reviews.apache.org/r/69562/#comment296719>

It will be good to add a comment here stating why we need to use table 
values here.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Lines 7719 (patched)
<https://reviews.apache.org/r/69562/#comment296717>

Now that we support partitioned CTAS, need to pass in partSpec here and 
handle it.



ql/src/java/org/apache/hadoop/hive/ql/plan/CreateViewDesc.java
Lines 415 (patched)
<https://reviews.apache.org/r/69562/#comment296718>

will be good to add a comment for this.


- Ashutosh Chauhan


On Dec. 13, 2018, 4:50 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69562/
> ---
> 
> (Updated Dec. 13, 2018, 4:50 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-16957
> https://issues.apache.org/jira/browse/HIVE-16957
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16957
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/hooks/TestHs2Hooks.java 
> d26af3b08130ce26006cc57c53e68efca1d01166 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java 
> 859c18f3c2059a8e0d4e3fd7f62a521a72e691fd 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> 3a51d9795b0384356daa0a8ab576374fb05c3378 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsAutoGatherContext.java 
> 11ccff44588e20d6acc47af31bfa05e3beba7e2e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
> 9aff0069fd0170cfec877caf481e8b6653435b81 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> b330d710a185aa44c4a89088bb025ecb28ba8856 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CreateViewDesc.java 
> f0f7b18d192f85b489ccde4e8a80e92dc11a0494 
>   ql/src/test/queries/clientpositive/cbo_rp_cross_product_check_2.q 
> 00c19c74ad45fc13e0a2cf74af3f0fb33b73a1a3 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite.q 
> 9735e61598520469f176719bc51b4437204fd522 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite_2.q 
> 3f695d1ee212902a0415ac2912a4f15d521cd380 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite_3.q 
> eb668a90acb546504cffb994ce25a1ab03c5b0c0 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite_4.q 
> f21db8a8d87fe47eb22a3c43d0856dd5463a1671 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q 
> 3026d9093eddbf53611a79df5fbe1ee55884273a 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite_dummy.q 
> 8c9da8ae69967d1e11a333a46fddc51957cd5f31 
>   
> ql/src/test/queries/clientpositive/materialized_view_create_rewrite_multi_db.q
>  85d926f9eb8c40d01bee6dab87baf5bf29790278 
>   
> ql/src/test/queries/clientpositive/materialized_view_create_rewrite_rebuild_dummy.q
>  72e3d65117c0712929bb217e3a5e769101b27ebd 
>   
> ql/src/test/queries/clientpositive/materialized_view_create_rewrite_time_window.q
>  4cdb715d2873b726561979a5b6d93c086467cd3d 
>   
> ql/src/test/queries/clientpositive/materialized_view_create_rewrite_time_window_2.q
>  6873673a55580b3d94f2af4b7c7c0b20f191d879 
>   ql/src/test/queries/clientpositive/materialized_view_rewrite_1.q 
> 18b9f7d418eff200d551ce4f95a399741dfca3f8 
>   ql/src/test/queries/clientpositive/materialized_view_rewrite_10.q 
> 95427923164a28b1b0ef73f03fa69544daa5ff1c 
>   ql/src/test/queries/clientpositive/materialized_view_rewrite_2.q 
> 3a447fc1873bf98d748fb9fd09278d60b7c9ac55 
>   ql/src/test/queries/clientpositive/materialized_view_rewrite_3.q 
> 0823f59394dd00d9d02b0ae517454c035b21baed 
>   ql/src/test/queries/clientpositive/materialized_view_rewrite_4.q 
> 6724cec7710f981d094576f6befccd2491ebe936 
>   ql/src/test/queries/clientpositive/materialized_view_rewrite_5.q 
> d87928c07363f6bd0ba4ae8f1986f5f53f513731 
>   ql/src/test/queries/clientpositive/materialized_view_rewrite_6.q 
> 23fc3c14ce5a0c43483824de9ac0835453f74c44 
>   ql/src/test/queries/clientpositive/materialized_view_rewrite_7.q 
> 3d1cedc4f56a1bceecba390292fcd26ad6ce1863 
>   ql/src/test/queries/clientpositive/materialized_view_rewrite_8.q 
> cfcfddce506d

Re: Review Request 69562: HIVE-16957

2018-12-20 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69562/#review211472
---




ql/src/test/queries/clientpositive/cbo_rp_cross_product_check_2.q
Line 7 (original), 7 (patched)
<https://reviews.apache.org/r/69562/#comment296644>

Is it necessary to specify schema to get auto-gather to work with CTAS?



ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_3.q.out
Lines 231-240 (original)
<https://reviews.apache.org/r/69562/#comment296646>

Did we lose stats auto-gather for this merge statement?



ql/src/test/results/clientpositive/llap/enforce_constraint_notnull.q.out
Lines 4640-4648 (original)
<https://reviews.apache.org/r/69562/#comment296649>

No more auto-gather stats?



ql/src/test/results/clientpositive/llap/insert_into_default_keyword.q.out
Lines 2952-2960 (original)
<https://reviews.apache.org/r/69562/#comment296652>

Losing stats branch.



ql/src/test/results/clientpositive/llap/runtime_stats_merge.q.out
Lines 169-176 (original)
<https://reviews.apache.org/r/69562/#comment296654>

Losing stats branch.



ql/src/test/results/clientpositive/llap/semijoin_hint.q.out
Lines 3361-3368 (original)
<https://reviews.apache.org/r/69562/#comment296655>

Losing stats branch.



ql/src/test/results/clientpositive/llap/sqlmerge.q.out
Lines 213-219 (original)
<https://reviews.apache.org/r/69562/#comment296656>

Losing stats branch.



ql/src/test/results/clientpositive/llap/tez_nway_join.q.out
Line 63 (original), 63 (patched)
<https://reviews.apache.org/r/69562/#comment296657>

Did we lose stats in this scenario?



ql/src/test/results/clientpositive/llap/vector_udf2.q.out
Line 291 (original), 291 (patched)
<https://reviews.apache.org/r/69562/#comment296661>

are we losing stats here?



ql/src/test/results/clientpositive/parallel_orderby.q.out
Line 97 (original)
<https://reviews.apache.org/r/69562/#comment296645>

Uhh.. we actually had a bug checked in the golden files :(


- Ashutosh Chauhan


On Dec. 13, 2018, 4:50 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69562/
> ---
> 
> (Updated Dec. 13, 2018, 4:50 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-16957
> https://issues.apache.org/jira/browse/HIVE-16957
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-16957
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/hooks/TestHs2Hooks.java 
> d26af3b08130ce26006cc57c53e68efca1d01166 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java 
> 859c18f3c2059a8e0d4e3fd7f62a521a72e691fd 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> 3a51d9795b0384356daa0a8ab576374fb05c3378 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsAutoGatherContext.java 
> 11ccff44588e20d6acc47af31bfa05e3beba7e2e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/ColumnStatsSemanticAnalyzer.java 
> 9aff0069fd0170cfec877caf481e8b6653435b81 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> b330d710a185aa44c4a89088bb025ecb28ba8856 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/CreateViewDesc.java 
> f0f7b18d192f85b489ccde4e8a80e92dc11a0494 
>   ql/src/test/queries/clientpositive/cbo_rp_cross_product_check_2.q 
> 00c19c74ad45fc13e0a2cf74af3f0fb33b73a1a3 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite.q 
> 9735e61598520469f176719bc51b4437204fd522 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite_2.q 
> 3f695d1ee212902a0415ac2912a4f15d521cd380 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite_3.q 
> eb668a90acb546504cffb994ce25a1ab03c5b0c0 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite_4.q 
> f21db8a8d87fe47eb22a3c43d0856dd5463a1671 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q 
> 3026d9093eddbf53611a79df5fbe1ee55884273a 
>   ql/src/test/queries/clientpositive/materialized_view_create_rewrite_dummy.q 
> 8c9da8ae69967d1e11a333a46fddc51957cd5f31 
>   
> ql/src/test/queries/clientpositive/materialized_view_create_rewrite_multi_db.q
>  85d926f9eb8c40d01bee6dab87baf5bf29790278 
>   
> ql/src/test/queries/clientpositive/materialized_view_create_rewrite_rebuild_dummy.q
>  72e3d65117c0712929bb217e3a5e769101b27ebd 
>   
> ql/src/test/queries/clientpositive/mater

Re: Review Request 69077: HIVE-20748

2018-12-19 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69077/#review211439
---




ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java
Lines 1275 (patched)
<https://reviews.apache.org/r/69077/#comment296550>

We can remove this statement, doesnt provide any extra info. And just keep 
first line.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java
Line 70 (original), 64 (patched)
<https://reviews.apache.org/r/69077/#comment296551>

Better name : resultCacheInvalidReason;



ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
Line 385 (original), 385 (patched)
<https://reviews.apache.org/r/69077/#comment296552>

Name : canCBOHandleReason.



ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java
Line 558 (original), 558 (patched)
<https://reviews.apache.org/r/69077/#comment296553>

Can we drop public? And leave it as protected or default.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Line 400 (original), 400 (patched)
<https://reviews.apache.org/r/69077/#comment296554>

invalidResultCacheReason.



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Lines 13722 (patched)
<https://reviews.apache.org/r/69077/#comment296555>

can remove this sentence.


- Ashutosh Chauhan


On Oct. 19, 2018, 11:26 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69077/
> ---
> 
> (Updated Oct. 19, 2018, 11:26 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20748
> https://issues.apache.org/jira/browse/HIVE-20748
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20748
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 807f159daa98d40e667914adc6c53fb8ecabf998 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java
>  df216e7555bff4756130f5e097bdb6b0e5e7eef5 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 22f3266c87f1d42c254893b424b68e757fb2953b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 
> be1c59f93272352705731c8c7a02433c7ac3d6dc 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> eed875e7a4475f207727d5d536521fdba0c329fb 
>   ql/src/test/queries/clientnegative/materialized_view_no_cbo_rewrite.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/materialized_view_no_cbo_rewrite_2.q 
> PRE-CREATION 
>   
> ql/src/test/queries/clientnegative/materialized_view_no_supported_op_rewrite.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientnegative/materialized_view_no_supported_op_rewrite_2.q
>  PRE-CREATION 
>   ql/src/test/results/clientnegative/materialized_view_no_cbo_rewrite.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/materialized_view_no_cbo_rewrite_2.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientnegative/materialized_view_no_supported_op_rewrite.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientnegative/materialized_view_no_supported_op_rewrite_2.q.out
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69077/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Re: Review Request 69512: HIVE-21006

2018-12-05 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69512/#review211064
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
Lines 287 (patched)
<https://reviews.apache.org/r/69512/#comment295919>

Modulo any semijoin branches or semijoin branch going from one TS to 
another TS which is under consideration for merge?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
Lines 313 (patched)
<https://reviews.apache.org/r/69512/#comment295920>

This extended check wasn't there earlier, do we need it now?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java
Lines 944 (patched)
<https://reviews.apache.org/r/69512/#comment295935>

Don't we need to check the source and target of SJ is same table on 
different TSs?


- Ashutosh Chauhan


On Dec. 5, 2018, 6:39 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69512/
> ---
> 
> (Updated Dec. 5, 2018, 6:39 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-21006
> https://issues.apache.org/jira/browse/HIVE-21006
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-21006
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> c2456714c2693066dffc50319c3aaa1f4760ade5 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/SharedWorkOptimizer.java 
> 0cb3b21fd81ec8c86127c2eaeb9f5e5d23291455 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java 
> dd1d6a1924f0894e1f24c1eab6655ed3264025fc 
> 
> 
> Diff: https://reviews.apache.org/r/69512/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

[ANNOUNCE] New committer: Bharathkrishna Guruvayoor Murali

2018-12-01 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited  Bharathkrishna
Guruvayoor Murali to become a committer, and we are pleased to announce that
he has accepted.

Bharath, welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)

[ANNOUNCE] New committer: Mahesh Behera

2018-11-16 Thread Ashutosh Chauhan

 Apache Hive's Project Management Committee (PMC) has invited Mahesh
Behera to become a committer, and we are pleased to announce that he has
accepted.
Mahesh, welcome, thank you for your contributions, and we look forward to
your further interactions with the community!

Thanks,
Ashutosh Chauhan (on behalf of the Apache Hive PMC)

Re: Review Request 69266: HIVE-20775

2018-11-10 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69266/#review210454
---




ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1531 (patched)
<https://reviews.apache.org/r/69266/#comment295122>

Should this logic be run as part of StatsRulesProcFactory when its 
computing stats for TS, since DPP branches are already created then. 
This will ensure that op tree's stats are updated for DPPso all walkers on 
tree will see this. Also all downstream ops will also compute there stats with 
DPP for TS being accounted for in StatsRulesProcFactory in that case. As its 
currently written these stats are visible only after SJ rules.



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1541 (patched)
<https://reviews.apache.org/r/69266/#comment295123>

Can't  there be a SEL here and FIL following that?



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1567 (patched)
<https://reviews.apache.org/r/69266/#comment295124>

This is updating stats for FIL op. But won't we need to retrigger updates 
on all downstream ops?



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Line 1560 (original), 1615 (patched)
<https://reviews.apache.org/r/69266/#comment295125>

Does this need instanceof check?



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1628-1645 (patched)
<https://reviews.apache.org/r/69266/#comment295126>

I don't follow this logic and role of reductionFactorMap. Can you please 
add comments for it?



ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
Lines 1662 (patched)
<https://reviews.apache.org/r/69266/#comment295127>

Do we need to trigger updateStats() for downstream ops?


- Ashutosh Chauhan


On Nov. 7, 2018, 12:18 a.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69266/
> ---
> 
> (Updated Nov. 7, 2018, 12:18 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Deepak Jaiswal.
> 
> 
> Bugs: HIVE-20775
> https://issues.apache.org/jira/browse/HIVE-20775
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20783
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
>  32fba6c8ff80befdde55542a4ae83b619256632e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java 
> 91d2f1f09112b1fc73dc0f9d4ed2784880f7a721 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 
> b7adc485a70e148e71feb594f311bfad1763479d 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction.q.out 
> ff9d98c63efb894d0503ec16d0ab1e8005fa8f7e 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_4.q.out 
> cc1c06da6346950155cd37dba5b5711c2e582b2e 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw.q.out 
> bd7fcbd7951423094cfd8e960645773da2dba903 
>   
> ql/src/test/results/clientpositive/llap/dynamic_semijoin_reduction_sw2.q.out 
> abcbd9727a9502a2007ae91a59fa0c44e063b4e8 
>   ql/src/test/results/clientpositive/llap/dynamic_semijoin_user_level.q.out 
> bc9e6fb083e73cb9c2532c79c0db3997790e6bf4 
>   
> ql/src/test/results/clientpositive/llap/vectorized_dynamic_semijoin_reduction.q.out
>  89986fbb3065aa87e4504711c99cd796f3bd1f8d 
>   ql/src/test/results/clientpositive/perf/tez/cbo_query23.q.out 
> ace7cf5b791fe6ff98d9d5055dc9022674225655 
>   ql/src/test/results/clientpositive/perf/tez/cbo_query54.q.out 
> eaf25363b166bc2105f64791a857707465ff2251 
>   ql/src/test/results/clientpositive/perf/tez/constraints/cbo_query54.q.out 
> 1cf3ce40745102346aa1f3496310be0cbcd7d4e3 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query10.q.out 
> 3fbd92878e0197ec0db1ce808f9bc4c0f5b255a3 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query12.q.out 
> 741bd90666c033a6874d1b2299a9404adf7e0ba4 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query13.q.out 
> 02966e4f474c8247e85230d24a3aee2b18962bd9 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query14.q.out 
> e8a6eaa464c17e2adeae3cb03ea0a8b083c1cef7 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query16.q.out 
> 3143be8480647bbbf47f13c12f83248980df4b95 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query17.q.out 
> e796101e4527f3ad418e28b7f93b3134ad4f8fc7 
>   ql/src/test/results/clientpositive/perf/tez/constraints/query18.q.o

[jira] [Created] (HIVE-20880) Update default value for hive.stats.filter.in.min.ratio

2018-11-07 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-20880:
---

 Summary: Update default value for hive.stats.filter.in.min.ratio
 Key: HIVE-20880
 URL: https://issues.apache.org/jira/browse/HIVE-20880
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[ANNOUNCE] New PMC Member : Zoltan

2018-10-30 Thread Ashutosh Chauhan

 Hello Hive community,

I'm pleased to announce that Zoltan Haindrich has accepted the Apache
Hive PMC's invitation, and is our newest PMC member. Many thanks to
Zoltan for all of his hard work.

Please join me in congratulating Zoltan!

Thanks,
Ashutosh

Re: Review Request 69019: HIVE-20617 Fix type of constants in IN expressions to have correct type

2018-10-22 Thread Ashutosh Chauhan



> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/in_typecheck_pointlook.q.out
> > Lines 56 (patched)
> > <https://reviews.apache.org/r/69019/diff/2/?file=2101646#file2101646line56>
> >
> > I expected 'Unknown' should have been char of length 6. Is there a 
> > reason to expand the length to 10?
> > As I mentioned previously if constant is of smaller length, then it 
> > doesn't make a difference, but is unnecessary, but if constant is of bigger 
> > length then LHS, then char::compare() actually truncates constant, so it 
> > better to create char with original length of constant.
> 
> Zoltan Haindrich wrote:
> It worked before the other way around; constants are expanded to the 
> target type - the addition I've made is that if the constant is longer; then 
> its made invalid

I think its better to let runtime dictacte the semantics in such cases. So, we 
just create constant char of its original length and then whatever runtime does 
with it, we will get that, instead of us "pre-processing" value at compile 
time. Other way to think about this is if there are 2 cols of char(5) and 
char(10) what will runtime do? Runtime already has logic to handle such cases, 
we let it handle that.


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/in_typecheck_varchar.q.out
> > Lines 42 (patched)
> > <https://reviews.apache.org/r/69019/diff/2/?file=2101647#file2101647line42>
> >
> > This is inconsistent. Char and string comparison happens in char. But, 
> > varchar and string comparison happens in String. Was this behavior present 
> > before this patch too?
> 
> Zoltan Haindrich wrote:
> yes; only char is handled differently

Can you leave a TODO : that to recheck it later.


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/infer_const_type.q.out
> > Line 145 (original), 145 (patched)
> > <https://reviews.apache.org/r/69019/diff/2/?file=2101648#file2101648line145>
> >
> > Is 'or null'  because of fl  = 'float' OR
> >   db  = 'double' ? I expected that to become " or false". Though "or 
> > null" will evaluate to same but "or false" is what I would expect.
> 
> Zoltan Haindrich wrote:
> the appearance of " or null" is because of the TODO fix in UDFOPEquals:
> 
> 
> https://github.com/kgyrtkirk/hive/blob/93f5a429fe5eb26f0b720fcef729efebe0549d76/ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java#L1179
> 
> I'm not sure why it wasn't removed in this case.
> 
> should I change it back and address it later?

Lets keep the change. Code change we have done is certainly valid.


> On Oct. 20, 2018, 6:57 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/results/clientpositive/join45.q.out
> > Line 717 (original), 717 (patched)
> > <https://reviews.apache.org/r/69019/diff/2/?file=2101649#file2101649line717>
> >
> > As discussed this should have been
> > (struct(cast (_col0 as double), cast(_col2 as double))) IN (const 
> > struct(100.0D,100.0D), const struct(101.0D,101.0D), const 
> > struct(102.0D,102.0D))
> 
> Zoltan Haindrich wrote:
> * column has type string.
> * the IN statement may have the same type for all element in this case 
> it's int...
> * in this case to change the left hand side to double: it may work
> 
> 
> but what should happen in case the IN operands contain 1 int, 1 decimal 
> and 1 string ?
> 
> note: during UDF evaluations there is a Set with all the values; and 
> "contains" is used - so if the inferred type is not a numeric: 
> 
> I've just checked that even the standard IN doesn't work like this:
> 
> ```
> create table t (a string);
> 
> insert into t values ('1'),('x'),('2.0');
> 
> select * from t where a in (1.0,'x',2);
> 1
> 2.0
> -- it doesn't return 'x' because it's casted to double
> 
> ```
> 
> I'm starting to think that it would be better to do this with the IN 
> unwinded into ORs: so that we could do the one-on-one constant checks and 
> then pointlookupoptimizer might collapse them if the types are the same - in 
> this case I think we would not loose 'x' in the above case - and it would 
> also make this whole recursive typecheck unneccessary.

We shall strive to match semantics which is already there. In Hive, for expr 
str_col = 12, we get cast of double on both sides. So, by ext

Re: Review Request 69017: HIVE-20718

2018-10-21 Thread Ashutosh Chauhan



> On Oct. 19, 2018, 6:34 a.m., Ashutosh Chauhan wrote:
> > data/scripts/q_perf_test_init_constraints.sql
> > Lines 5-8 (patched)
> > <https://reviews.apache.org/r/69017/diff/2/?file=2098905#file2098905line5>
> >
> > Shall we use varchar() to match original tpcds spec?
> 
> Jesús Camacho Rodríguez wrote:
> This is apparently not changed in TPCDS benchmark we run on Hive either:
> 
> https://github.com/hortonworks/hive-testbench/blob/hdp3/ddl-tpcds/text/alltables.sql
> 
> I believe we should change it indeed, but in both places. Gopal has just 
> told me he plans to change it shortly, together with decimal and date types. 
> Should we tackle in follow-up? (We would change it for perf driver without 
> constraints too).

Yeah.. its better to test whats in standard since we aspire to test standard as 
it is in future. TODO is fine for now.


> On Oct. 19, 2018, 6:34 a.m., Ashutosh Chauhan wrote:
> > data/scripts/q_perf_test_init_constraints.sql
> > Lines 664 (patched)
> > <https://reviews.apache.org/r/69017/diff/2/?file=2098905#file2098905line664>
> >
> > Any reason for not adding this constraint?
> 
> Jesús Camacho Rodríguez wrote:
> Yes, I had seen that. This is commented out in 
> https://github.com/hortonworks/hive-testbench/blob/hdp3/ddl-tpcds/bin_partitioned/add_constraints.sql
>  . I checked it and cr_ship_date_sk is not present in catalog_returns table. 
> I checked the tpcds spec and it is not there either. However, it is mentioned 
> somewhere in the spec in the context of materializations. I just left it 
> there commented out for reference, but I can remove it too...

commented out is fine.


> On Oct. 19, 2018, 6:34 a.m., Ashutosh Chauhan wrote:
> > itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java
> > Lines 294-297 (patched)
> > <https://reviews.apache.org/r/69017/diff/2/?file=2098909#file2098909line294>
> >
> > Whats the reason for this?
> 
> Jesús Camacho Rodríguez wrote:
> These are flaky, i.e., plan string changes slightly among different 
> executions. For those that contain window functions, the problem was 
> https://issues.apache.org/jira/browse/CALCITE-2622 , which I already fixed. 
> Then there are a couple of queries for which Calcite generates a synthetic 
> field with an identifier during optimization, and the identifier is changing 
> among executions. I have not explored that one yet, but I plan to do it. In 
> any case, the plan generation does not fail, it is just flakiness.

sounds good.


> On Oct. 19, 2018, 6:34 a.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
> > Lines 539-550 (patched)
> > <https://reviews.apache.org/r/69017/diff/2/?file=2098914#file2098914line539>
> >
> > Is this pure refactoring? Or is there a logic change too?
> 
> Jesús Camacho Rodríguez wrote:
> This is almost pure refactoring. The only change is in new L654 : the 
> _if_ clause was outside the previous _if_ clause, which was wrong and was 
> causing hitting the assertion in L658 for a couple of tpcds queries. The 
> reason is that we should only do the inner _if_ verification if we have 
> detected that the FK-PK is present in the join, not in all cases.

ok


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69017/#review209776
---


On Oct. 20, 2018, 1:10 a.m., Jesús Camacho Rodríguez wrote:
> 
> -------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69017/
> ---
> 
> (Updated Oct. 20, 2018, 1:10 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20718
> https://issues.apache.org/jira/browse/HIVE-20718
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20718
> 
> 
> Diffs
> -
> 
>   data/conf/perf-reg/tez/hive-site.xml 
> 78a5481e0333a3ce9bc516e03273abe6a51c9a49 
>   data/scripts/q_perf_test_init.sql d27215b4cb570c0680212157ebb348e819ad802f 
>   data/scripts/q_perf_test_init_constraints.sql PRE-CREATION 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezPerfCliDriver.java
>  98ceb214047ba56fc2e1ebabc7b0860f22524203 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezPerfConstraintsCliDriver.java
>  PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 
> 8349e3d84eeae4695cf91

Re: Review Request 69019: HIVE-20617 Fix type of constants in IN expressions to have correct type

2018-10-21 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69019/#review209831
---




ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
Line 1171 (original), 1176 (patched)
<https://reviews.apache.org/r/69019/#comment294429>

Can remove this TODO now.



ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
Lines 1386-1397 (patched)
<https://reviews.apache.org/r/69019/#comment294430>

Why we need to add this logic now? Prior to this patch, we would have 
coerced RHS constant to decimal if column is of type decimal.

AFAICS, this if() can only be executed if we had in query either float_col 
= 4.2BD or the expression is dobule_col = 4.2BD but that will be handled by  if 
(PrimitiveObjectInspectorUtils.floatTypeEntry.equals(primitiveTypeEntry)) {
else if 
(PrimitiveObjectInspectorUtils.doubleTypeEntry.equals(primitiveTypeEntry))



ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
Line 1298 (original), 1404 (patched)
<https://reviews.apache.org/r/69019/#comment294431>

My suggestion is to do final HiveChar newValue = new HiveChar(constValue, 
constValue.getlength());


- Ashutosh Chauhan


On Oct. 20, 2018, 5:51 a.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69019/
> ---
> 
> (Updated Oct. 20, 2018, 5:51 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Vineet Garg.
> 
> 
> Bugs: HIVE-20617
> https://issues.apache.org/jira/browse/HIVE-20617
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> For IN expressions the types were never corrected; and pointlookupoptimizer 
> was probably leaving behind fields already which were uncomparable; 
> HIVE-20296 exposed it further by changing the minimal number from  32 to 2.
> 
> This change generalizes the retyping of constants to also run it for the IN 
> operator ; and also for struct-s.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/type/HiveChar.java 29dc06dca1 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java e7d71595c7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 4968d16876 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java 
> c274fd7cc9 
>   ql/src/test/queries/clientpositive/in_typecheck_char.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/in_typecheck_pointlook.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/in_typecheck_varchar.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_partition_coltype.q.out f6c3c5642e 
>   ql/src/test/results/clientpositive/in_typecheck2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck_char.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck_pointlook.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck_varchar.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_const_type.q.out e1d7de5422 
>   ql/src/test/results/clientpositive/join45.q.out 47aaf7d0ab 
>   ql/src/test/results/clientpositive/join47.q.out 4d9e937815 
>   ql/src/test/results/clientpositive/llap/dec_str.q.out 554031e952 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out f240468558 
>   ql/src/test/results/clientpositive/llap/lineage3.q.out cf38816127 
>   ql/src/test/results/clientpositive/llap/vectorization_13.q.out 4ce654f960 
>   ql/src/test/results/clientpositive/llap/vectorization_6.q.out a2f730beca 
>   ql/src/test/results/clientpositive/llap/vectorization_8.q.out 21ce7b8ebd 
>   ql/src/test/results/clientpositive/llap/vectorization_short_regress.q.out 
> 7f1c6a295e 
>   ql/src/test/results/clientpositive/mapjoin47.q.out 294dd69de5 
>   ql/src/test/results/clientpositive/parquet_vectorization_13.q.out 
> 0efce98b55 
>   ql/src/test/results/clientpositive/parquet_vectorization_6.q.out 0bb6888364 
>   ql/src/test/results/clientpositive/parquet_vectorization_8.q.out 957bd7b264 
>   ql/src/test/results/clientpositive/ppd_udf_col.q.out 814fb5afcf 
>   ql/src/test/results/clientpositive/spark/cbo_simple_select.q.out acf91bf178 
>   ql/src/test/results/clientpositive/spark/parquet_vectorization_13.q.out 
> 3812239343 
>   ql/src/test/results/clientpositive/spark/parquet_vectorization_6.q.out 
> 6108457aad 
>   ql/src/test/results/clientpositive/spark/parquet_vectorization_8.q.out 
> 3352dedc58 
>   ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 
> f5a4c9ad86 
>   q

Re: Review Request 69019: HIVE-20617 Fix type of constants in IN expressions to have correct type

2018-10-20 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69019/#review209827
---




ql/src/test/queries/clientpositive/in_typecheck_pointlook.q
Lines 2 (patched)
<https://reviews.apache.org/r/69019/#comment294417>

Any reason for this?



ql/src/test/results/clientpositive/alter_partition_coltype.q.out
Line 163 (original), 163 (patched)
<https://reviews.apache.org/r/69019/#comment294416>

String and int comparison happens in double. So, should this be 3.0D ?



ql/src/test/results/clientpositive/in_typecheck_pointlook.q.out
Lines 56 (patched)
<https://reviews.apache.org/r/69019/#comment294418>

I expected 'Unknown' should have been char of length 6. Is there a reason 
to expand the length to 10?
As I mentioned previously if constant is of smaller length, then it doesn't 
make a difference, but is unnecessary, but if constant is of bigger length then 
LHS, then char::compare() actually truncates constant, so it better to create 
char with original length of constant.



ql/src/test/results/clientpositive/in_typecheck_varchar.q.out
Lines 42 (patched)
<https://reviews.apache.org/r/69019/#comment294419>

This is inconsistent. Char and string comparison happens in char. But, 
varchar and string comparison happens in String. Was this behavior present 
before this patch too?



ql/src/test/results/clientpositive/infer_const_type.q.out
Line 145 (original), 145 (patched)
<https://reviews.apache.org/r/69019/#comment294420>

Is 'or null'  because of fl  = 'float' OR
  db  = 'double' ? I expected that to become " or false". Though "or null" 
will evaluate to same but "or false" is what I would expect.



ql/src/test/results/clientpositive/join45.q.out
Line 717 (original), 717 (patched)
<https://reviews.apache.org/r/69019/#comment294421>

As discussed this should have been
(struct(cast (_col0 as double), cast(_col2 as double))) IN (const 
struct(100.0D,100.0D), const struct(101.0D,101.0D), const struct(102.0D,102.0D))



ql/src/test/results/clientpositive/parquet_vectorization_13.q.out
Line 86 (original), 86 (patched)
<https://reviews.apache.org/r/69019/#comment294422>

    Dont we print f for float constant suffix? ie 3569.0f ?


- Ashutosh Chauhan


On Oct. 20, 2018, 5:51 a.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69019/
> ---
> 
> (Updated Oct. 20, 2018, 5:51 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Vineet Garg.
> 
> 
> Bugs: HIVE-20617
> https://issues.apache.org/jira/browse/HIVE-20617
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> For IN expressions the types were never corrected; and pointlookupoptimizer 
> was probably leaving behind fields already which were uncomparable; 
> HIVE-20296 exposed it further by changing the minimal number from  32 to 2.
> 
> This change generalizes the retyping of constants to also run it for the IN 
> operator ; and also for struct-s.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/type/HiveChar.java 29dc06dca1 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java e7d71595c7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 4968d16876 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java 
> c274fd7cc9 
>   ql/src/test/queries/clientpositive/in_typecheck_char.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/in_typecheck_pointlook.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/in_typecheck_varchar.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_partition_coltype.q.out f6c3c5642e 
>   ql/src/test/results/clientpositive/in_typecheck2.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck_char.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck_pointlook.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/in_typecheck_varchar.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/infer_const_type.q.out e1d7de5422 
>   ql/src/test/results/clientpositive/join45.q.out 47aaf7d0ab 
>   ql/src/test/results/clientpositive/join47.q.out 4d9e937815 
>   ql/src/test/results/clientpositive/llap/dec_str.q.out 554031e952 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out f240468558 
>   ql/src/test/results/clientpositive/llap/lineage3.q.out cf38816127 
>   ql/src/test/results/clientpositive/llap/vectorization_13.q.out 4ce654f960 
>   ql/src/test/results/clientpositive/llap/vec

Re: Review Request 69019: HIVE-20617 Fix type of constants in IN expressions to have correct type

2018-10-19 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69019/#review209778
---

ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
Line 1172 (original), 1181 (patched)
<https://reviews.apache.org/r/69019/#comment294365>

This TODO will be good to resolve.
We have type already so we can return null constant of appropriate type 
here, no?

ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
Lines 1404 (patched)
<https://reviews.apache.org/r/69019/#comment294366>

Add a comment: comparison of decimal and double happens in double.

ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java
Lines 1292-1294 (original), 1416-1418 (patched)
<https://reviews.apache.org/r/69019/#comment294367>

Not sure this logic is incorrect. We shall not coerce the length of 
constant to be same as length of type. 

Comparisons for 2 char types happens on stripped values. So, if constant is 
of smaller length then this probably won't be a problem but is unnecessary. 
However, if constant is longer, looks like HiveChar will truncate it and then 
comparison likely will be wrong. Better is to create constant char of same 
length as of string.

ql/src/test/queries/clientpositive/in_typecheck1.q
Lines 4 (patched)
<https://reviews.apache.org/r/69019/#comment294369>

Can you repeat this test with s varchar(1), t varchar(10); ?

ql/src/test/queries/clientpositive/in_typecheck1.q
Lines 7 (patched)
<https://reviews.apache.org/r/69019/#comment294368>

Add: 
select * from ax where t = 'a ';
select * from ax where t = 'a  ';
select * from ax where t = 'a  d';

RHS constant is of length 10,11,12. I expect first and second to return 2 
rows, while third to return 0 rows.

When t is varchar all 3 should return 0 rows.

ql/src/test/queries/clientpositive/in_typecheck1.q
Lines 10 (patched)
<https://reviews.apache.org/r/69019/#comment294370>

When s and t are varchar this should return 1 row.

ql/src/test/results/clientpositive/in_typecheck1.q.out
Lines 42 (patched)
<https://reviews.apache.org/r/69019/#comment294362>

This doesn't look correct. For char/varchar comparisons doesn't preserve 
trailing spaces. e.g., 
create table t1 (a char(3), b varchar(4));
insert into t1 values ('a','b');
select * from t1 where b = 'b';
a   b
select * from t1 where a = 'a';
a   b

Got this one on both postgres and oracle.
whereas this change would return empty results in Hive i believe

ql/src/test/results/clientpositive/join45.q.out
Line 717 (original), 717 (patched)
<https://reviews.apache.org/r/69019/#comment294363>

comparison of str and integer is done in double. See e.g, 
infer_const_type.q.out where there is (UDFToDouble(str) = 1234.0D
However, inside IN we are now casting constant to string. 
Granted casting to double in this scenario is debatable but Hive had this 
behavior since beginning and it need to be consistent inside IN or with direct 
comparison.

ql/src/test/results/clientpositive/llap/subquery_scalar.q.out
Line 371 (original), 370 (patched)
<https://reviews.apache.org/r/69019/#comment294364>

Unrelated to this patch, but I think subq decorrelation logic incorrectly 
generated this filter. It should really have been p_name is null.

- Ashutosh Chauhan

On Oct. 15, 2018, 6:05 a.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69019/
> ---
> 
> (Updated Oct. 15, 2018, 6:05 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20617
> https://issues.apache.org/jira/browse/HIVE-20617
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> For IN expressions the types were never corrected; and pointlookupoptimizer 
> was probably leaving behind fields already which were uncomparable; 
> HIVE-20296 exposed it further by changing the minimal number from  32 to 2.
> 
> This change generalizes the retyping of constants to also run it for the IN 
> operator ; and also for struct-s.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 4968d16876c5c9cc36ec9a3ec48c2740c2c67dcd 
>   ql/src/test/queries/clientpositive/in_typecheck1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/in_typecheck2.q PRE-CREATION 
>   ql/src/test/results/clientpositive/alter_partition_coltype.q.out 
> 5727f0a65c6e4736f41017e4e962d932dedbd6bd 
&g

Re: Review Request 69017: HIVE-20718

2018-10-19 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69017/#review209776
---




data/scripts/q_perf_test_init_constraints.sql
Lines 5-8 (patched)
<https://reviews.apache.org/r/69017/#comment294352>

Shall we use varchar() to match original tpcds spec?



data/scripts/q_perf_test_init_constraints.sql
Lines 160 (patched)
<https://reviews.apache.org/r/69017/#comment294353>

Any reason to create external tables and not managed tables?



data/scripts/q_perf_test_init_constraints.sql
Lines 177 (patched)
<https://reviews.apache.org/r/69017/#comment294354>

Will be better to have these as acid tables.



data/scripts/q_perf_test_init_constraints.sql
Lines 664 (patched)
<https://reviews.apache.org/r/69017/#comment294355>

Any reason for not adding this constraint?



itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java
Lines 294-297 (patched)
<https://reviews.apache.org/r/69017/#comment294356>

Whats the reason for this?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java
Lines 539-550 (patched)
<https://reviews.apache.org/r/69017/#comment294358>

Is this pure refactoring? Or is there a logic change too?



ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
Lines 196 (patched)
<https://reviews.apache.org/r/69017/#comment294357>

Can you add this as nonReserved in IdentifierParser.g so that cbo doesn't 
become a reserved keyword.


- Ashutosh Chauhan


On Oct. 17, 2018, 6:42 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69017/
> ---
> 
> (Updated Oct. 17, 2018, 6:42 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20718
> https://issues.apache.org/jira/browse/HIVE-20718
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20718
> 
> 
> Diffs
> -
> 
>   data/conf/perf-reg/tez/hive-site.xml 
> 78a5481e0333a3ce9bc516e03273abe6a51c9a49 
>   data/scripts/q_perf_test_init_constraints.sql PRE-CREATION 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezPerfCliDriver.java
>  98ceb214047ba56fc2e1ebabc7b0860f22524203 
>   
> itests/qtest/src/test/java/org/apache/hadoop/hive/cli/TestTezPerfConstraintsCliDriver.java
>  PRE-CREATION 
>   itests/src/test/resources/testconfiguration.properties 
> b6d42c64af002aa37a9c088dea8ea6b41c96c950 
>   
> itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java 
> 5e1e88e89d29f94006764e09cf1c60e58cffdc54 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java 
> b4d5806d4ed7f23f2dcf5299fe3c0b2fbe22ff80 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 
> 46bf088f2c5ca97e11cc7ab939ef8ddaefd453c6 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/ATSHook.java 
> 92fcfec673fd7d0cb2ce92bd6fffa3eee2b9b1da 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java 
> 0af30d48f32f1e6a8286c869db9182ba9c5557ed 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java 
> dc0a84b37dc3213c52d331471cb0e16bd499886e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinConstraintsRule.java
>  0a307f248aff8c72e7c52a425181a50dd5dd2023 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 22f3266c87f1d42c254893b424b68e757fb2953b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainConfiguration.java 
> a92502e74646f15a68a3fd488d71570d7a068566 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainSemanticAnalyzer.java 
> 49b614634ff9196d5ef97a300105414f857200bd 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g 
> 8bf9cc0ad69dc96f022cde6f500dd3cd68bf9300 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 
> bc95c46d24098fa5706ab8178eddd0d744d4f57d 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ExplainWork.java 
> 01da4d558d0737df0408fe5d9050641cca550e46 
>   
> ql/src/test/org/apache/hadoop/hive/ql/parse/TestUpdateDeleteSemanticAnalyzer.java
>  932f4e850b6b197b3cc67df007341f2e49900921 
>   ql/src/test/queries/clientpositive/perf/cbo_query1.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/cbo_query10.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/cbo_query11.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/cbo_query12.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/cbo_query13.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/cbo_query14.q PRE

Re: Review Request 69078: HIVE-20767

2018-10-19 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69078/#review209775
---




ql/src/test/results/clientpositive/llap/mapjoin_hint.q.out
Lines 469-470 (original), 470-472 (patched)
<https://reviews.apache.org/r/69078/#comment294348>

This plan looks slower than previous with extra shuffle.



ql/src/test/results/clientpositive/llap/subquery_scalar.q.out
Lines 240-241 (original), 242-244 (patched)
<https://reviews.apache.org/r/69078/#comment294349>

This plan looks slower than previous with extra shuffle.



ql/src/test/results/clientpositive/llap/subquery_scalar.q.out
Line 1005 (original), 1038 (patched)
<https://reviews.apache.org/r/69078/#comment294350>

This plan looks slower than previous with extra shuffle.



ql/src/test/results/clientpositive/llap/subquery_select.q.out
Lines 4905-4907 (original), 4905-4907 (patched)
<https://reviews.apache.org/r/69078/#comment294351>

This plan looks slower than previous with extra shuffle.


- Ashutosh Chauhan


On Oct. 19, 2018, 12:18 a.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69078/
> ---
> 
> (Updated Oct. 19, 2018, 12:18 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Vineet Garg.
> 
> 
> Bugs: HIVE-20767
> https://issues.apache.org/jira/browse/HIVE-20767
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20767
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinProjectTransposeRule.java
>  e6844326068210e7ab7364ec9f3ec60908b36e88 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 22f3266c87f1d42c254893b424b68e757fb2953b 
>   ql/src/test/results/clientnegative/subquery_scalar_multi_rows.q.out 
> 0a780db7ef98ddf14fc18a90f2b628a13337bcda 
>   ql/src/test/results/clientpositive/bool_unknown.q.out 
> 8e9b48ccafce2093e91d7035fdf581018bab1979 
>   ql/src/test/results/clientpositive/llap/lineage2.q.out 
> d32f490a704e1ba6bdc54f4d54dff028c9ca974c 
>   ql/src/test/results/clientpositive/llap/mapjoin_hint.q.out 
> ac505a5c1e47211260a795390a3ca7c45ee30c01 
>   ql/src/test/results/clientpositive/llap/materialized_view_rewrite_7.q.out 
> 6f00a5c82b2c2784d2576222d6eb4bd9faf71bea 
>   ql/src/test/results/clientpositive/llap/multiMapJoin1.q.out 
> ae821f600b3e9bf3128e93c0a4097b438f26f35b 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 
> 166bd60ce25712021c06d3506246ff87d35058c4 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out 
> 854d215a2382e5b09714c7b8b9916cf048458f66 
>   ql/src/test/results/clientpositive/perf/spark/query54.q.out 
> f10250f307cc58486dbb202c95b2c8af7943e528 
>   ql/src/test/results/clientpositive/perf/tez/query54.q.out 
> 1c17d2a53a1be39b28d01412fe6b13878b1680f3 
>   ql/src/test/results/clientpositive/spark/subquery_scalar.q.out 
> b3252f54158d5401c3d10b0978af69b9dbba1290 
>   ql/src/test/results/clientpositive/spark/subquery_select.q.out 
> ad7c8a3fc795004ea5dbd2bc4355de24b73352be 
> 
> 
> Diff: https://reviews.apache.org/r/69078/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Re: Review Request 69077: HIVE-20748

2018-10-19 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69077/#review209774
---




ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
Lines 1768-1779 (patched)
<https://reviews.apache.org/r/69077/#comment294345>

Is there a reason to do these validations in seperate passes? If they can 
be combined in 1 pass, that will help keep latency of compiler low.



ql/src/test/results/clientnegative/materialized_view_no_cbo_rewrite.q.out
Lines 22 (patched)
<https://reviews.apache.org/r/69077/#comment294346>

Can we provide better error message here? That rewriting can't be enabled 
because it contains outer join.



ql/src/test/results/clientnegative/materialized_view_no_cbo_rewrite_2.q.out
Lines 36 (patched)
<https://reviews.apache.org/r/69077/#comment294347>

better error message.


- Ashutosh Chauhan


On Oct. 19, 2018, 12:17 a.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69077/
> ---
> 
> (Updated Oct. 19, 2018, 12:17 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20748
> https://issues.apache.org/jira/browse/HIVE-20748
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20748
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 
> 807f159daa98d40e667914adc6c53fb8ecabf998 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOpMaterializationValidator.java
>  df216e7555bff4756130f5e097bdb6b0e5e7eef5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptAutomaticRewritingMaterializationValidator.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 22f3266c87f1d42c254893b424b68e757fb2953b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java 
> be1c59f93272352705731c8c7a02433c7ac3d6dc 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> eed875e7a4475f207727d5d536521fdba0c329fb 
>   ql/src/test/queries/clientnegative/materialized_view_no_cbo_rewrite.q 
> PRE-CREATION 
>   ql/src/test/queries/clientnegative/materialized_view_no_cbo_rewrite_2.q 
> PRE-CREATION 
>   
> ql/src/test/queries/clientnegative/materialized_view_no_supported_op_rewrite.q
>  PRE-CREATION 
>   
> ql/src/test/queries/clientnegative/materialized_view_no_supported_op_rewrite_2.q
>  PRE-CREATION 
>   ql/src/test/results/clientnegative/materialized_view_no_cbo_rewrite.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/materialized_view_no_cbo_rewrite_2.q.out 
> PRE-CREATION 
>   
> ql/src/test/results/clientnegative/materialized_view_no_supported_op_rewrite.q.out
>  PRE-CREATION 
>   
> ql/src/test/results/clientnegative/materialized_view_no_supported_op_rewrite_2.q.out
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69077/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Review Request 69031: Changed default config of hive.tez.llap.min.reducer.per.executor to 0.33

2018-10-16 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69031/
---

Review request for hive.


Bugs: HIVE-20572
https://issues.apache.org/jira/browse/HIVE-20572


Repository: hive-git


Description
---

Changed default config of hive.tez.llap.min.reducer.per.executor to 0.33


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 29958b3e50 
  ql/src/test/results/clientpositive/llap/bucket_groupby.q.out 433e033b6e 
  ql/src/test/results/clientpositive/llap/cbo_limit.q.out 0d5c8f0e36 
  ql/src/test/results/clientpositive/llap/cbo_rp_limit.q.out 0d5c8f0e36 
  ql/src/test/results/clientpositive/llap/cbo_rp_views.q.out 878a767a19 
  ql/src/test/results/clientpositive/llap/cbo_views.q.out 214574ed61 
  ql/src/test/results/clientpositive/llap/cluster.q.out 056c4dac15 
  ql/src/test/results/clientpositive/llap/constraints_optimization.q.out 
b45b7c409f 
  ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out 
0d32c395c6 
  ql/src/test/results/clientpositive/llap/cte_1.q.out 044fb70cbc 
  ql/src/test/results/clientpositive/llap/dp_counter_mm.q.out 4ca60ba5ce 
  ql/src/test/results/clientpositive/llap/dp_counter_non_mm.q.out 101b343506 
  ql/src/test/results/clientpositive/llap/except_distinct.q.out ea0224c888 
  ql/src/test/results/clientpositive/llap/hybridgrace_hashjoin_2.q.out 
da2be55462 
  ql/src/test/results/clientpositive/llap/intersect_all.q.out 1f6b0b872b 
  ql/src/test/results/clientpositive/llap/intersect_distinct.q.out b4c69b1505 
  ql/src/test/results/clientpositive/llap/lateral_view.q.out c1bca18a07 
  ql/src/test/results/clientpositive/llap/lineage2.q.out f56b100046 
  ql/src/test/results/clientpositive/llap/llap_decimal64_reader.q.out 
945dfd6a37 
  ql/src/test/results/clientpositive/llap/llap_smb.q.out ed10999f8f 
  ql/src/test/results/clientpositive/llap/materialized_view_create.q.out 
36a3d8c3bf 
  
ql/src/test/results/clientpositive/llap/materialized_view_create_rewrite_2.q.out
 d7c1ee15d2 
  ql/src/test/results/clientpositive/llap/materialized_view_describe.q.out 
2928fcfb9b 
  ql/src/test/results/clientpositive/llap/multi_count_distinct_null.q.out 
a049b02fda 
  ql/src/test/results/clientpositive/llap/parquet_types.q.out 508ac16878 
  ql/src/test/results/clientpositive/llap/parquet_types_vectorization.q.out 
4cc93bdd2a 
  ql/src/test/results/clientpositive/llap/partition_multilevels.q.out 
00d0a14515 
  ql/src/test/results/clientpositive/llap/reduce_deduplicate_extended.q.out 
53d0f3192d 
  ql/src/test/results/clientpositive/llap/results_cache_1.q.out 86a110bb83 
  ql/src/test/results/clientpositive/llap/results_cache_with_masking.q.out 
ba3b2804be 
  ql/src/test/results/clientpositive/llap/skiphf_aggr.q.out 667692a2d7 
  ql/src/test/results/clientpositive/llap/subquery_notin.q.out 083ad3074a 
  ql/src/test/results/clientpositive/llap/tez_input_counters.q.out 8f4f0551cd 
  ql/src/test/results/clientpositive/llap/tez_smb_reduce_side.q.out 9ee3dc161d 
  ql/src/test/results/clientpositive/llap/tez_union2.q.out b00c36ebb8 
  ql/src/test/results/clientpositive/llap/udaf_collect_set_2.q.out 4d6a68404e 
  ql/src/test/results/clientpositive/llap/unionDistinct_3.q.out 6cef15adc6 
  ql/src/test/results/clientpositive/llap/vector_complex_all.q.out f0f5fe7a8a 
  ql/src/test/results/clientpositive/llap/vector_grouping_sets.q.out 78de6807d3 
  ql/src/test/results/clientpositive/llap/vector_partitioned_date_time.q.out 
4711f35165 
  ql/src/test/results/clientpositive/llap/vector_ptf_part_simple.q.out 
6fa48e88a4 
  ql/src/test/results/clientpositive/llap/vector_windowing_expressions.q.out 
49daa409e3 
  
ql/src/test/results/clientpositive/llap/vector_windowing_multipartitioning.q.out
 7596c9a8c7 
  
ql/src/test/results/clientpositive/llap/vector_windowing_range_multiorder.q.out 
b906d156b5 
  ql/src/test/results/clientpositive/llap/vectorized_distinct_gby.q.out 
0cffc4e750 
  ql/src/test/results/clientpositive/llap/vectorized_parquet.q.out db7262f078 


Diff: https://reviews.apache.org/r/69031/diff/1/


Testing
---


Thanks,

Ashutosh Chauhan

Re: Review Request 69018: Follow up on review comments remove unwanted extra columns kerberos fix

2018-10-15 Thread Ashutosh Chauhan



> On Oct. 15, 2018, 12:47 a.m., Ashutosh Chauhan wrote:
> > kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaOutputFormat.java
> > Line 52 (original), 52 (patched)
> > <https://reviews.apache.org/r/69018/diff/1/?file=2097817#file2097817line52>
> >
> > This boolean is not used anywhere, can be removed alongwith associated 
> > property.
> 
> Slim Bouguerra wrote:
> As of now i have kept the path for task level commit, as a hidden 
> feature, that maybe we want to bring it in the future.
> What you think?

Ok, we can keep it as an undocumented feature.


> On Oct. 15, 2018, 12:47 a.m., Ashutosh Chauhan wrote:
> > kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaUtils.java
> > Lines 284 (patched)
> > <https://reviews.apache.org/r/69018/diff/1/?file=2097822#file2097822line286>
> >
> > Its better to use a single principal (and its keytab) to avoid 
> > dependency on both. Is that not possible? If not add comments.
> 
> Slim Bouguerra wrote:
> we do not depend on both, but in case HS2 credentienals not present will 
> use the LLAP ones. I am not sure how ambari drop and distribute the keys, it 
> is safer to have this check.
> I have added the comment already.

>From purist point of view, Its hard to reason about which principal this logic 
>needs precisely because we don't know which one will be used. In practice, 
>however since we never usually make a distinction between llap and HS2 
>principal this won't be an issue. So, we can keep this for now.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69018/#review209529
---


On Oct. 15, 2018, 12:07 a.m., Slim Bouguerra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69018/
> ---
> 
> (Updated Oct. 15, 2018, 12:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Gopal V, and Vineet Garg.
> 
> 
> Bugs: HIVE-20735
> https://issues.apache.org/jira/browse/HIVE-20735
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> As part of the review comments we agreed to:
> 
> remove start and end offsets columns
> remove the best effort mode
> make the 2pc as default protocol for EOS
> Also this patch will include an additional enhancement to add kerberos 
> support.
> 
> 
> Diffs
> -
> 
>   kafka-handler/README.md 706c77ae25 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaOutputFormat.java 
> 950f7315c2 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaRecordReader.java 
> 746de61273 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java 
> 51cfa24929 
>   
> kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaStorageHandler.java 
> 0d64cd9c9c 
>   
> kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaTableProperties.java 
> 2e1f6faf1f 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaUtils.java 
> 6ae9c8d276 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaWritable.java 
> 681b666fdf 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/MetadataColumn.java 
> 60e1aea55d 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/SimpleKafkaWriter.java 
> c95bdb02de 
>   
> kafka-handler/src/test/org/apache/hadoop/hive/kafka/KafkaRecordIteratorTest.java
>  3d3f598bc0 
>   kafka-handler/src/test/org/apache/hadoop/hive/kafka/KafkaUtilsTest.java 
> 8aebb9254e 
>   kafka-handler/src/test/org/apache/hadoop/hive/kafka/KafkaWritableTest.java 
> 45bf7912c4 
>   
> kafka-handler/src/test/org/apache/hadoop/hive/kafka/SimpleKafkaWriterTest.java
>  d8168e02a0 
>   ql/src/test/queries/clientpositive/kafka_storage_handler.q 595f0320b6 
>   ql/src/test/results/clientpositive/druid/kafka_storage_handler.q.out 
> 73f0f293d9 
> 
> 
> Diff: https://reviews.apache.org/r/69018/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Slim Bouguerra
> 
>

[ANNOUNCE] New committer: Nishant Bangarwa

2018-10-15 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited Nishant
Bangarwa
to become a committer, and we are pleased to announce that he has accepted.

Nishant, welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)

Re: Review Request 69018: Follow up on review comments remove unwanted extra columns kerberos fix

2018-10-14 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69018/#review209529
---




kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaOutputFormat.java
Line 52 (original), 52 (patched)
<https://reviews.apache.org/r/69018/#comment294021>

This boolean is not used anywhere, can be removed alongwith associated 
property.



kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaOutputFormat.java
Line 78 (original), 75 (patched)
<https://reviews.apache.org/r/69018/#comment294022>

This should always be true. Perhaps, shall be removed as argument.



kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaTableProperties.java
Line 55 (original), 56 (patched)
<https://reviews.apache.org/r/69018/#comment294023>

This can be removed and treated as true.



kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaUtils.java
Lines 284 (patched)
<https://reviews.apache.org/r/69018/#comment294024>

Its better to use a single principal (and its keytab) to avoid dependency 
on both. Is that not possible? If not add comments.


- Ashutosh Chauhan


On Oct. 15, 2018, 12:07 a.m., Slim Bouguerra wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69018/
> ---
> 
> (Updated Oct. 15, 2018, 12:07 a.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan, Gopal V, and Vineet Garg.
> 
> 
> Bugs: HIVE-20735
> https://issues.apache.org/jira/browse/HIVE-20735
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> As part of the review comments we agreed to:
> 
> remove start and end offsets columns
> remove the best effort mode
> make the 2pc as default protocol for EOS
> Also this patch will include an additional enhancement to add kerberos 
> support.
> 
> 
> Diffs
> -
> 
>   kafka-handler/README.md 706c77ae25 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaOutputFormat.java 
> 950f7315c2 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaRecordReader.java 
> 746de61273 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java 
> 51cfa24929 
>   
> kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaStorageHandler.java 
> 0d64cd9c9c 
>   
> kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaTableProperties.java 
> 2e1f6faf1f 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaUtils.java 
> 6ae9c8d276 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaWritable.java 
> 681b666fdf 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/MetadataColumn.java 
> 60e1aea55d 
>   kafka-handler/src/java/org/apache/hadoop/hive/kafka/SimpleKafkaWriter.java 
> c95bdb02de 
>   
> kafka-handler/src/test/org/apache/hadoop/hive/kafka/KafkaRecordIteratorTest.java
>  3d3f598bc0 
>   kafka-handler/src/test/org/apache/hadoop/hive/kafka/KafkaUtilsTest.java 
> 8aebb9254e 
>   kafka-handler/src/test/org/apache/hadoop/hive/kafka/KafkaWritableTest.java 
> 45bf7912c4 
>   
> kafka-handler/src/test/org/apache/hadoop/hive/kafka/SimpleKafkaWriterTest.java
>  d8168e02a0 
>   ql/src/test/queries/clientpositive/kafka_storage_handler.q 595f0320b6 
>   ql/src/test/results/clientpositive/druid/kafka_storage_handler.q.out 
> 73f0f293d9 
> 
> 
> Diff: https://reviews.apache.org/r/69018/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Slim Bouguerra
> 
>

[ANNOUNCE] New committer: Janaki Lahorani

2018-10-08 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited Janaki
Lahorani to become a committer, and we are pleased to announce that she has
accepted.
Janaki, welcome, thank you for your contributions, and we look forward to
your further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)

Re: Review Request 68897: Allow merge statement to have column schema

2018-10-02 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68897/#review209179
---

ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java
Line 1376 (original), 1376 (patched)
<https://reviews.apache.org/r/68897/#comment293503>

Can you add a comment on what this for loop is suppose to do?

ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java
Lines 1409 (patched)
<https://reviews.apache.org/r/68897/#comment293504>

replaceDefaultKeywordForMerge() is based on index. That is it assumes 
values list is in same order as in column list in target table which was true 
till now, but now with this change columns can be in any order and this may not 
work.

ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java
Line 1388 (original), 1410 (patched)
<https://reviews.apache.org/r/68897/#comment293505>

valuesClause may need to reorder column which are in columnListNode order 
to match them to order of target table. See, my ask to add new tests in later 
comments.

ql/src/test/queries/clientpositive/sqlmerge_stats.q
Line 34 (original), 34 (patched)
<https://reviews.apache.org/r/68897/#comment293506>

Can you please add following tests:

1. This should throw an error, since values clause cardinality need to 
match columnlist cardinality.  MERGE into t as t using upd_t as u ON t.a = u.a 
WHEN MATCHED THEN DELETE
WHEN NOT MATCHED THEN INSERT (a) VALUES(u.a);

2. merge into t as t using upd_t as u ON t.a = u.a 
WHEN MATCHED THEN DELETE
WHEN NOT MATCHED THEN INSERT (b, a) VALUES(u.a, u.b);

3. Assuming t's schema is create table t (a int, b default 1) then merge 
into t as t using upd_t as u ON t.a = u.a 
WHEN MATCHED THEN DELETE
WHEN NOT MATCHED THEN INSERT (b, a) VALUES(default, u.b);

4. Assuming t's schema is create table t (a int, b default 1) then merge 
into t as t using upd_t as u ON t.a = u.a 
WHEN MATCHED THEN update set b = default
WHEN NOT MATCHED THEN INSERT (b, a) VALUES(default, u.b);

- Ashutosh Chauhan

On Oct. 2, 2018, 1:26 p.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68897/
> ---
> 
> (Updated Oct. 2, 2018, 1:26 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20590
> https://issues.apache.org/jira/browse/HIVE-20590
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Allow merge statement to have column schema.
> 
> Also removed some unused code, and made the rewritten query more consistent 
> (upper case SQL keywords everywhere)
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 78bc87c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 
> e8823e1 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCardinalityViolation.java
>  b688447 
>   ql/src/test/queries/clientpositive/sqlmerge_stats.q c480eb6 
>   ql/src/test/results/clientpositive/llap/sqlmerge_stats.q.out 02aa87a 
> 
> 
> Diff: https://reviews.apache.org/r/68897/diff/1/
> 
> 
> Testing
> ---
> 
> Tested on local cluster using the new syntax. Also modified a q file to use 
> the new syntax.
> 
> 
> Thanks,
> 
> Miklos Gergely
> 
>

Re: Review Request 68744: Add Surrogate Keys function to Hive

2018-09-18 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68744/#review208739
---




ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSurrogateKey.java
Lines 118 (patched)
<https://reviews.apache.org/r/68744/#comment292883>

taskId won't change during a lifecycle of this object in tasks. So, you can 
move this line in configure(), save it in a field and use it in evaluate().



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSurrogateKey.java
Lines 119 (patched)
<https://reviews.apache.org/r/68744/#comment292884>

Don't see a reason to have lastTaskId field. Perhaps, you can get rid of it.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSurrogateKey.java
Lines 124-125 (patched)
<https://reviews.apache.org/r/68744/#comment292885>

This can be moved to configure()



ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSurrogateKey.java
Lines 55 (patched)
<https://reviews.apache.org/r/68744/#comment292886>

do udf.setWriteId(3) and call runAndVerifyConst() again to get coverage on 
writeId too.



ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSurrogateKey.java
Lines 82 (patched)
<https://reviews.apache.org/r/68744/#comment292887>

Also add -ve tests to catch exception when we go over limits.


- Ashutosh Chauhan


On Sept. 18, 2018, 9:35 p.m., Miklos Gergely wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68744/
> ---
> 
> (Updated Sept. 18, 2018, 9:35 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20536
> https://issues.apache.org/jira/browse/HIVE-20536
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Add new function that allows the generation of a surrogate key composed of 
> the write id, the task id, and an incremental row id.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3f538b3 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> 3309b9b 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 6d7e63e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSurrogateKey.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFSurrogateKey.java
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 8d41e78 
> 
> 
> Diff: https://reviews.apache.org/r/68744/diff/2/
> 
> 
> Testing
> ---
> 
> Added a new junit test for the function.
> Tested it in beeline by adding one row, adding multiple rows, adding mutliple 
> rows to multiple tables via multuple insert (all having their own 
> surrogate_key column)
> 
> 
> Thanks,
> 
> Miklos Gergely
> 
>

[jira] [Created] (HIVE-20572) Change default value of hive.tez.llap.min.reducer.per.executor

2018-09-17 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-20572:
---

 Summary: Change default value of 
hive.tez.llap.min.reducer.per.executor
 Key: HIVE-20572
 URL: https://issues.apache.org/jira/browse/HIVE-20572
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20571) Change default value of hive.tez.dag.status.check.interval

2018-09-17 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-20571:
---

 Summary: Change default value of hive.tez.dag.status.check.interval
 Key: HIVE-20571
 URL: https://issues.apache.org/jira/browse/HIVE-20571
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20558) Change default of hive.hashtable.key.count.adjustment to 0.99

2018-09-13 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-20558:
---

 Summary:  Change default of hive.hashtable.key.count.adjustment to 
0.99
 Key: HIVE-20558
 URL: https://issues.apache.org/jira/browse/HIVE-20558
 Project: Hive
  Issue Type: Improvement
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Current default is 2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20518) TxnHandler checkLock direct sql fail with ORA-01795 , if the table has more than 1000 partitions

2018-09-07 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-20518:
---

 Summary: TxnHandler checkLock direct sql fail with ORA-01795 , if 
the table has more than 1000 partitions
 Key: HIVE-20518
 URL: https://issues.apache.org/jira/browse/HIVE-20518
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.1.0
Reporter: Rajkumar Singh
Assignee: Rajkumar Singh


with Oracle as Metastore, txnhandler checkLock fail with 
"checkLockWithRetry(181398,34773) : ORA-01795: maximum number of expressions in 
a list is 1000" if the write table has more than 1000 partitions.

complete stacktrace

{code}

txn.TxnHandler (TxnHandler.java:checkRetryable(2099)) - Non-retryable error in 
checkLockWithRetry(181398,34773) : ORA-01795: maximum number of expressions in 
a list is 1000

 (SQLState=42000, ErrorCode=1795)

2018-06-25 15:09:35,999 ERROR [pool-7-thread-197]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invokeInternal(203)) - MetaException(message:Unable to 
update transaction database java.sql.SQLSyntaxErrorException: ORA-01795: 
maximum number of expressions in a list is 1000

 

    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447)

    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)

    at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951)

    at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513)

    at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:227)

    at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)

    at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:195)

    at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:876)

    at 
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1175)

    at 
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1296)

    at 
oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1498)

    at 
oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:406)

    at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464)

    at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2649)

    at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1126)

    at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:895)

    at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:6123)

    at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source)

    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:498)

    at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)

    at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)

    at com.sun.proxy.$Proxy11.lock(Unknown Source)

    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:12012)

    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:11996)

    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

    at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:551)

    at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:546)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:422)

    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

    at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:546)

    at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)

    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

    at java.lang.Thread.run(Thread.java:748)

)

    at 
org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1131)

    at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:895)

    at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:6123)

    at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source)

    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.lang.reflect.Method.invoke(Method.java:498)

    at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandle

Re: Review Request 68525: HIVE-20296 Improve HivePointLookupOptimizerRule to be able to extract from more sophisticated contexts

2018-09-05 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68525/#review208392
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java
Lines 278 (patched)
<https://reviews.apache.org/r/68525/#comment292283>

Does this class also exist in Calcite? If so, can you please leave a note 
here to remove from Hive once these are made public in Calcite?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java
Lines 307 (patched)
<https://reviews.apache.org/r/68525/#comment292284>

LOG.debug



ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java
Lines 319 (patched)
<https://reviews.apache.org/r/68525/#comment292285>

Error msg : Unable to find constraint which was earlier added.


- Ashutosh Chauhan


On Aug. 27, 2018, 4:01 p.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68525/
> ---
> 
> (Updated Aug. 27, 2018, 4:01 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20296
> https://issues.apache.org/jira/browse/HIVE-20296
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> generalized rule to extract INs from more complex filter conditions as well.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HivePointLookupOptimizerRule.java
>  eff9a312aaa09a889f0c6a045bd813d1fb633956 
>   
> ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/rules/TestHivePointLookupOptimizerRule.java
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/druid/druidmini_test_ts.q.out 
> a8e6894a9786318ed4362ee37d918a3699f074a0 
>   ql/src/test/results/clientpositive/llap/bucketpruning1.q.out 
> 260ba1cbddee7f0946f0cdec1070359ab2a1d2aa 
>   ql/src/test/results/clientpositive/perf/spark/query15.q.out 
> 67684f6b0bc44c0cae6107be94d131a083eca0e1 
>   ql/src/test/results/clientpositive/perf/spark/query47.q.out 
> 690b1054c12f7d588015afc301802d9d2da2d0b9 
>   ql/src/test/results/clientpositive/perf/spark/query57.q.out 
> 51e644a87bf4befbe4368cdf03b5eaab6d4f2049 
>   ql/src/test/results/clientpositive/perf/tez/query15.q.out 
> e1eca99d95e13070f901b75a406391435b2b4f1d 
>   ql/src/test/results/clientpositive/perf/tez/query47.q.out 
> d034ea9433a3b1c54c545edf526c215bf79388e1 
>   ql/src/test/results/clientpositive/perf/tez/query57.q.out 
> 42cbbdc2a4d8bc469c9d91867faf21fc94a057ea 
> 
> 
> Diff: https://reviews.apache.org/r/68525/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>

[ANNOUNCE] New committer: Andrew Sherman

2018-08-28 Thread Ashutosh Chauhan

Apache Hive's Project Management Committee (PMC) has invited Andrew Sherman
to become a committer, and we are pleased to announce that he has accepted.

Andrew, welcome, thank you for your contributions, and we look forward to
your
further interactions with the community!

Ashutosh Chauhan (on behalf of the Apache Hive PMC)

Re: Review Request 68313: HIVE-20366 TPC-DS query78 stats estimates are off for is null filter

2018-08-17 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68313/#review207573
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1700 (patched)
<https://reviews.apache.org/r/68313/#comment290972>

Lets s/dangling/unmatched/gc everywhere.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Lines 1703 (patched)
<https://reviews.apache.org/r/68313/#comment290973>

e.g., here



ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
Line 2162 (original), 2178 (patched)
<https://reviews.apache.org/r/68313/#comment290975>

Rename interimNumRows to unmatchedRows?



ql/src/test/results/clientpositive/annotate_stats_join.q.out
Line 901 (original), 901 (patched)
<https://reviews.apache.org/r/68313/#comment290976>

This looks incorrect both before and after. There are few columns being 
output with 4 of them being join key. 194/54 = less than 4 bytes per row. Can 
you check how much nulls we predicted here and if thats correct?



ql/src/test/results/clientpositive/llap/auto_join30.q.out
Line 172 (original), 172 (patched)
<https://reviews.apache.org/r/68313/#comment290977>

Data size is way underestimated. Can you verify?



ql/src/test/results/clientpositive/llap/subquery_notin.q.out
Line 5354 (original), 5354 (patched)
<https://reviews.apache.org/r/68313/#comment290978>

Data size is too low. Can you verify?


- Ashutosh Chauhan


On Aug. 12, 2018, 10:08 p.m., Vineet Garg wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68313/
> ---
> 
> (Updated Aug. 12, 2018, 10:08 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20366
> https://issues.apache.org/jira/browse/HIVE-20366
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Heuristic to estimate unmatched rows for outer joins
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
>  7682791f4d 
>   ql/src/test/results/clientpositive/annotate_stats_join.q.out b0d2b05ab0 
>   ql/src/test/results/clientpositive/llap/auto_join30.q.out 874511a112 
>   ql/src/test/results/clientpositive/llap/bucket_map_join_tez2.q.out 
> 205cd444b2 
>   ql/src/test/results/clientpositive/llap/check_constraint.q.out ec1ed64fe8 
>   ql/src/test/results/clientpositive/llap/correlationoptimizer1.q.out 
> 21b07b2c80 
>   ql/src/test/results/clientpositive/llap/explainuser_1.q.out a98191653f 
>   ql/src/test/results/clientpositive/llap/insert_into_default_keyword.q.out 
> 1a61c0e592 
>   ql/src/test/results/clientpositive/llap/join46.q.out b6ef9b184e 
>   ql/src/test/results/clientpositive/llap/join_emit_interval.q.out 9484b7ae0a 
>   ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out 
> f8ce1ce93e 
>   ql/src/test/results/clientpositive/llap/mapjoin3.q.out 7aa7318896 
>   ql/src/test/results/clientpositive/llap/mapjoin46.q.out 204e7755e5 
>   ql/src/test/results/clientpositive/llap/mapjoin_emit_interval.q.out 
> f6a1a6ee41 
>   ql/src/test/results/clientpositive/llap/skewjoinopt15.q.out cd20c3ab17 
>   ql/src/test/results/clientpositive/llap/subquery_in.q.out a045b12dc6 
>   ql/src/test/results/clientpositive/llap/subquery_multi.q.out a865ee9259 
>   ql/src/test/results/clientpositive/llap/subquery_notin.q.out f5f5f36aa3 
>   ql/src/test/results/clientpositive/llap/subquery_scalar.q.out 4423aec8a2 
>   ql/src/test/results/clientpositive/llap/subquery_select.q.out cf3d60f4b3 
>   ql/src/test/results/clientpositive/llap/tez_join_tests.q.out bf2f5a8548 
>   ql/src/test/results/clientpositive/llap/tez_joins_explain.q.out 72b84a0106 
>   ql/src/test/results/clientpositive/llap/tez_union.q.out 914ed47859 
>   ql/src/test/results/clientpositive/llap/unionDistinct_1.q.out f006e37b56 
>   ql/src/test/results/clientpositive/llap/vector_coalesce_3.q.out d05dd70206 
>   ql/src/test/results/clientpositive/llap/vector_groupby_mapjoin.q.out 
> 6443678f89 
>   ql/src/test/results/clientpositive/llap/vector_outer_join0.q.out 19e98f3f4a 
>   ql/src/test/results/clientpositive/llap/vector_outer_join1.q.out c74a588993 
>   ql/src/test/results/clientpositive/llap/vectorized_join46.q.out e03948f8b0 
>   ql/src/test/results/clientpositive/spark/annotate_stats_join.q.out 
> 7d45328d41 
>   ql/src/test/results/clientpositive/spark/spark_explainuser_1.q.out 
> 00f5d7ef11 
> 
> 
> Diff: https://reviews.apache.org/r/68313/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vineet Garg
> 
>

Re: Review Request 68337: HIVE-20379

2018-08-14 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68337/#review207250
---



2 minor comments. You may fix them on commit.


ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Lines 12565 (patched)
<https://reviews.apache.org/r/68337/#comment290580>

Will be good to expand on why we are reodering?



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
Lines 12571 (patched)
<https://reviews.apache.org/r/68337/#comment290581>

unused.


- Ashutosh Chauhan


On Aug. 14, 2018, 12:37 a.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68337/
> ---
> 
> (Updated Aug. 14, 2018, 12:37 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20379
> https://issues.apache.org/jira/browse/HIVE-20379
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20379
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 361f150193a155d45eb64266f88eb88f0a881ad3 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 2ee562add907c2b57992df27ecbb4fd5e114cdba 
>   ql/src/test/queries/clientpositive/materialized_view_rewrite_part_2.q 
> 505f7507bc9adb25e544dc422164858fb20fed0e 
>   ql/src/test/results/clientpositive/llap/materialized_view_partitioned.q.out 
> b12df11a98e55c00c8b77e8292666373f3509364 
>   
> ql/src/test/results/clientpositive/llap/materialized_view_partitioned_3.q.out 
> 726c660cf21d26cc5cc120d1397243958b49f834 
>   
> ql/src/test/results/clientpositive/llap/materialized_view_rewrite_part_1.q.out
>  492bb226fd03d51686e2040aed4766aef7150592 
>   
> ql/src/test/results/clientpositive/llap/materialized_view_rewrite_part_2.q.out
>  e748ccb010fc755f9d6af82c26c8ccffc26b1a55 
> 
> 
> Diff: https://reviews.apache.org/r/68337/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Re: Review Request 68261: HIVE-20332

2018-08-12 Thread Ashutosh Chauhan



> On Aug. 11, 2018, 7:45 p.m., Ashutosh Chauhan wrote:
> > Isn't incremental rebuild always cheaper for Project-Filter-Join MVs since 
> > they are always insert only? If so, we don't need cost based decision 
> > there. 
> > Also can you remind an  example for a MV containing aggregate where 
> > incremental rebuild via merge can be costlier?
> 
> Jesús Camacho Rodríguez wrote:
> bq. Isn't incremental rebuild always cheaper for Project-Filter-Join MVs 
> since they are always insert only?
> Yes, it will always be cheaper.
> 
> bq. If so, we don't need cost based decision there. 
> I just thought we preferred to make rewriting decisions cost-based 
> instead of using Hep.
> 
> bq.Also can you remind an  example for a MV containing aggregate where 
> incremental rebuild via merge can be costlier?
> When there are many new rows and NDV for grouping columns is high: GBy 
> does not reduce the number of rows and MERGE may end up doing a lot of work 
> with OUTER JOIN + INSERT/UPDATE.
> 
> 
> We can use HepPlanner for incremental rebuild (it needs a minor extension 
> in Calcite and it should mostly work). Then if a rewriting is produced, 1) 
> for Project-Filter-Join MVs we always use it, and 2) for 
> Project-Filter-Join-Aggregate MVs make use of the heuristic.
> However, note that we will still need to introduce a parameter to be able 
> to tune the heuristic, right?
>     If that is the case, we may introduce Hep for Project-Filter-Join MVs in 
> a follow-up?
> 
> Ashutosh Chauhan wrote:
> From changes in q.out looks like before this patch rewriting wasn't 
> trigerred even for PFJ cases. Why would that be the case? In those cases 
> there are 2 candidate plans: one for full rebuild + onverwrite  and another 
> for full build with additional predicate on writeId + insert into. This 
> Second plan should be cheaper because of additional predicates. Why didn't we 
> pick that before this patch?
> 
> Jesús Camacho Rodríguez wrote:
> The incremental rebuild works in two steps: 1) produce the partial 
> rewriting using the MV, and 2) transform rewriting into INSERT/MERGE 
> depending on whether the MV constains Aggregate or not. The costing is done 
> over the partial rewriting. That is Union(MV contents, PFJ of new data), and 
> in the case of containing Aggregate it is Agg(Union(MV contents, PFJA of new 
> data)).
> 
> The cost of the union input using the MV is already reduced using 
> heuristics (we favour plans containing materialized views). However, the 
> other input to the union is cost as usual. In both cases (with and without 
> Aggregate), we may end up overestimating number of rows coming through that 
> input. If we estimate Filter condition over ROWID almost did not reduce input 
> number of rows, then it is easy to estimate that the Union rewriting will be 
> more expensive as new operators in the tree (e.g. additional Project to 
> remove that ROWID column or separate Filter operator for ROWID) will add to 
> the total cost because they need to process those rows.
> 
> Without this patch, here are the two plans for the simple mv that you 
> mentioned (ignore cpu cost as that is only taken into account in case of draw 
> for the cardinality):
> - FULL REBUILD: 
> HiveProject(key=[$0], value=[$1])
>   HiveFilter(subset=[rel#2044:Subset#1.HIVE.[]], 
> condition=[AND(>(CAST($0):DOUBLE, 200), <(CAST($0):DOUBLE, 250))])
> HiveTableScan(subset=[rel#2042:Subset#0.HIVE.[]], table=[[default, 
> src_txn]], table:alias=[src_txn])
> Total cost: {751.5 rows, 1253.5 cpu, 0.0 io}
> 
> - PARTIAL REWRITING (INC REBUILD):
> HiveUnion(all=[true])
>   HiveProject(subset=[rel#2071:Subset#6.HIVE.[]], key=[$0], value=[$1])
> HiveFilter(subset=[rel#2069:Subset#5.HIVE.[]], 
> condition=[AND(>(CAST($0):DOUBLE, 200), <(CAST($0):DOUBLE, 250))])
>   HiveFilter(subset=[rel#2067:Subset#4.HIVE.[]], condition=[<(1, 
> $4.writeid)])
> HiveTableScan(subset=[rel#2042:Subset#0.HIVE.[]], 
> table=[[default, src_txn]], table:alias=[src_txn])
>   HiveProject(subset=[rel#2074:Subset#8.HIVE.[]], key=[$1], value=[$0])
> HiveTableScan(subset=[rel#2072:Subset#7.HIVE.[]], table=[[default, 
> partition_mv_1]], table:alias=[default.partition_mv_1])
> Total cost: {876.752276249 rows, 1378.75283625 cpu, 0.0 io}
> 
> (Btw, I can enable FilterMerge rule in the same loop as the MV rewriting, 
> but that will still not change outcome in many cases -Project for ROWID will 
> still add overhead- and will add to the optimization time).
> 
> Jesús Camacho Rodríguez wrote:
> The second one

Re: Review Request 68261: HIVE-20332

2018-08-12 Thread Ashutosh Chauhan



> On Aug. 11, 2018, 7:45 p.m., Ashutosh Chauhan wrote:
> > Isn't incremental rebuild always cheaper for Project-Filter-Join MVs since 
> > they are always insert only? If so, we don't need cost based decision 
> > there. 
> > Also can you remind an  example for a MV containing aggregate where 
> > incremental rebuild via merge can be costlier?
> 
> Jesús Camacho Rodríguez wrote:
> bq. Isn't incremental rebuild always cheaper for Project-Filter-Join MVs 
> since they are always insert only?
> Yes, it will always be cheaper.
> 
> bq. If so, we don't need cost based decision there. 
> I just thought we preferred to make rewriting decisions cost-based 
> instead of using Hep.
> 
> bq.Also can you remind an  example for a MV containing aggregate where 
> incremental rebuild via merge can be costlier?
> When there are many new rows and NDV for grouping columns is high: GBy 
> does not reduce the number of rows and MERGE may end up doing a lot of work 
> with OUTER JOIN + INSERT/UPDATE.
> 
> 
> We can use HepPlanner for incremental rebuild (it needs a minor extension 
> in Calcite and it should mostly work). Then if a rewriting is produced, 1) 
> for Project-Filter-Join MVs we always use it, and 2) for 
> Project-Filter-Join-Aggregate MVs make use of the heuristic.
> However, note that we will still need to introduce a parameter to be able 
> to tune the heuristic, right?
>     If that is the case, we may introduce Hep for Project-Filter-Join MVs in 
> a follow-up?
> 
> Ashutosh Chauhan wrote:
> From changes in q.out looks like before this patch rewriting wasn't 
> trigerred even for PFJ cases. Why would that be the case? In those cases 
> there are 2 candidate plans: one for full rebuild + onverwrite  and another 
> for full build with additional predicate on writeId + insert into. This 
> Second plan should be cheaper because of additional predicates. Why didn't we 
> pick that before this patch?
> 
> Jesús Camacho Rodríguez wrote:
> The incremental rebuild works in two steps: 1) produce the partial 
> rewriting using the MV, and 2) transform rewriting into INSERT/MERGE 
> depending on whether the MV constains Aggregate or not. The costing is done 
> over the partial rewriting. That is Union(MV contents, PFJ of new data), and 
> in the case of containing Aggregate it is Agg(Union(MV contents, PFJA of new 
> data)).
> 
> The cost of the union input using the MV is already reduced using 
> heuristics (we favour plans containing materialized views). However, the 
> other input to the union is cost as usual. In both cases (with and without 
> Aggregate), we may end up overestimating number of rows coming through that 
> input. If we estimate Filter condition over ROWID almost did not reduce input 
> number of rows, then it is easy to estimate that the Union rewriting will be 
> more expensive as new operators in the tree (e.g. additional Project to 
> remove that ROWID column or separate Filter operator for ROWID) will add to 
> the total cost because they need to process those rows.
> 
> Without this patch, here are the two plans for the simple mv that you 
> mentioned (ignore cpu cost as that is only taken into account in case of draw 
> for the cardinality):
> - FULL REBUILD: 
> HiveProject(key=[$0], value=[$1])
>   HiveFilter(subset=[rel#2044:Subset#1.HIVE.[]], 
> condition=[AND(>(CAST($0):DOUBLE, 200), <(CAST($0):DOUBLE, 250))])
> HiveTableScan(subset=[rel#2042:Subset#0.HIVE.[]], table=[[default, 
> src_txn]], table:alias=[src_txn])
> Total cost: {751.5 rows, 1253.5 cpu, 0.0 io}
> 
> - PARTIAL REWRITING (INC REBUILD):
> HiveUnion(all=[true])
>   HiveProject(subset=[rel#2071:Subset#6.HIVE.[]], key=[$0], value=[$1])
> HiveFilter(subset=[rel#2069:Subset#5.HIVE.[]], 
> condition=[AND(>(CAST($0):DOUBLE, 200), <(CAST($0):DOUBLE, 250))])
>   HiveFilter(subset=[rel#2067:Subset#4.HIVE.[]], condition=[<(1, 
> $4.writeid)])
> HiveTableScan(subset=[rel#2042:Subset#0.HIVE.[]], 
> table=[[default, src_txn]], table:alias=[src_txn])
>   HiveProject(subset=[rel#2074:Subset#8.HIVE.[]], key=[$1], value=[$0])
> HiveTableScan(subset=[rel#2072:Subset#7.HIVE.[]], table=[[default, 
> partition_mv_1]], table:alias=[default.partition_mv_1])
> Total cost: {876.752276249 rows, 1378.75283625 cpu, 0.0 io}
> 
> (Btw, I can enable FilterMerge rule in the same loop as the MV rewriting, 
> but that will still not change outcome in many cases -Project for ROWID will 
> still add overhead- and will add to the optimization time).
> 
> Jesús Camacho Rodríguez wrote:
> The second one

Re: Review Request 68261: HIVE-20332

2018-08-11 Thread Ashutosh Chauhan



> On Aug. 11, 2018, 7:45 p.m., Ashutosh Chauhan wrote:
> > Isn't incremental rebuild always cheaper for Project-Filter-Join MVs since 
> > they are always insert only? If so, we don't need cost based decision 
> > there. 
> > Also can you remind an  example for a MV containing aggregate where 
> > incremental rebuild via merge can be costlier?
> 
> Jesús Camacho Rodríguez wrote:
> bq. Isn't incremental rebuild always cheaper for Project-Filter-Join MVs 
> since they are always insert only?
> Yes, it will always be cheaper.
> 
> bq. If so, we don't need cost based decision there. 
> I just thought we preferred to make rewriting decisions cost-based 
> instead of using Hep.
> 
> bq.Also can you remind an  example for a MV containing aggregate where 
> incremental rebuild via merge can be costlier?
> When there are many new rows and NDV for grouping columns is high: GBy 
> does not reduce the number of rows and MERGE may end up doing a lot of work 
> with OUTER JOIN + INSERT/UPDATE.
> 
> 
> We can use HepPlanner for incremental rebuild (it needs a minor extension 
> in Calcite and it should mostly work). Then if a rewriting is produced, 1) 
> for Project-Filter-Join MVs we always use it, and 2) for 
> Project-Filter-Join-Aggregate MVs make use of the heuristic.
> However, note that we will still need to introduce a parameter to be able 
> to tune the heuristic, right?
> If that is the case, we may introduce Hep for Project-Filter-Join MVs in 
> a follow-up?

>From changes in q.out looks like before this patch rewriting wasn't trigerred 
>even for PFJ cases. Why would that be the case? In those cases there are 2 
>candidate plans: one for full rebuild + onverwrite  and another for full build 
>with additional predicate on writeId + insert into. This Second plan should be 
>cheaper because of additional predicates. Why didn't we pick that before this 
>patch?


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68261/#review207113
---


On Aug. 8, 2018, 3:39 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68261/
> -------
> 
> (Updated Aug. 8, 2018, 3:39 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20332
> https://issues.apache.org/jira/browse/HIVE-20332
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20332
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 5bdcac88d0015d2410da050524e6697a22d83eb9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
>  635d27e723dc1d260574723296f3484c26106a9c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveMaterializedViewsRelMetadataProvider.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java
>  43f8508ffbf4ba3cc46016e1d300d6ca9c2e8ccb 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCumulativeCost.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistinctRowCount.java
>  80b939a9f65142baa149b79460b753ddf469aacf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSelectivity.java
>  575902d78de2a7f95585c23a3c2fc03b9ce89478 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSize.java
>  97097381d9619e67bcab8a268d571d2a392485b3 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
>  3bf62c535cec1e7a3eac43f0ce40879dbfc89799 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 361f150193a155d45eb64266f88eb88f0a881ad3 
>   ql/src/test/results/clientpositive/llap/materialized_view_partitioned.q.out 
> b12df11a98e55c00c8b77e8292666373f3509364 
>   ql/src/test/results/clientpositive/llap/materialized_view_rebuild.q.out 
> 4d37d82b6e1f3d4ab8b76c391fa94176356093c2 
> 
> 
> Diff: https://reviews.apache.org/r/68261/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Re: Review Request 68261: HIVE-20332

2018-08-11 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68261/#review207113
---



Isn't incremental rebuild always cheaper for Project-Filter-Join MVs since they 
are always insert only? If so, we don't need cost based decision there. 
Also can you remind an  example for a MV containing aggregate where incremental 
rebuild via merge can be costlier?

- Ashutosh Chauhan


On Aug. 8, 2018, 3:39 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68261/
> ---
> 
> (Updated Aug. 8, 2018, 3:39 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20332
> https://issues.apache.org/jira/browse/HIVE-20332
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20332
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 5bdcac88d0015d2410da050524e6697a22d83eb9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
>  635d27e723dc1d260574723296f3484c26106a9c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveMaterializedViewsRelMetadataProvider.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java
>  43f8508ffbf4ba3cc46016e1d300d6ca9c2e8ccb 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCumulativeCost.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistinctRowCount.java
>  80b939a9f65142baa149b79460b753ddf469aacf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSelectivity.java
>  575902d78de2a7f95585c23a3c2fc03b9ce89478 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSize.java
>  97097381d9619e67bcab8a268d571d2a392485b3 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
>  3bf62c535cec1e7a3eac43f0ce40879dbfc89799 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 361f150193a155d45eb64266f88eb88f0a881ad3 
>   ql/src/test/results/clientpositive/llap/materialized_view_partitioned.q.out 
> b12df11a98e55c00c8b77e8292666373f3509364 
>   ql/src/test/results/clientpositive/llap/materialized_view_rebuild.q.out 
> 4d37d82b6e1f3d4ab8b76c391fa94176356093c2 
> 
> 
> Diff: https://reviews.apache.org/r/68261/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

[jira] [Created] (HIVE-20364) Update default for hive.map.aggr.hash.min.reduction

2018-08-10 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-20364:
---

 Summary: Update default for hive.map.aggr.hash.min.reduction
 Key: HIVE-20364
 URL: https://issues.apache.org/jira/browse/HIVE-20364
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Nita Dembla
Assignee: Ashutosh Chauhan


Default value is 0.5 Lets update it to 0.99
In average case its a trade-off between cpu vs network. Erring on side of CPU 
is better since perf loss caused by network is usually larger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 8395 matches

Mail list logo