date:20180811

[jira] [Created] (HIVE-20367) Vectorization: Support streaming for PTF AVG, MAX, MIN, SUM

2018-08-11 Thread Matt McCline (JIRA)

Matt McCline created HIVE-20367:
---

 Summary: Vectorization: Support streaming for PTF AVG, MAX, MIN, 
SUM
 Key: HIVE-20367
 URL: https://issues.apache.org/jira/browse/HIVE-20367
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Matt McCline
Assignee: Matt McCline


Add support for vectorizing PTF AVG, MAX, MIN, SUM when:

{noformat}
ROWS PRECEDING(MAX)~CURRENT
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HIVE-20368) Remove VectorTopNKeyOperator lock

2018-08-11 Thread Teddy Choi (JIRA)

Teddy Choi created HIVE-20368:
-

 Summary: Remove VectorTopNKeyOperator lock
 Key: HIVE-20368
 URL: https://issues.apache.org/jira/browse/HIVE-20368
 Project: Hive
  Issue Type: Bug
Reporter: Teddy Choi
Assignee: Teddy Choi


VectorTopNKeyOperator has a lock in line 199 as following.
{code:java}
priorityQueue.offer(WritableUtils.clone(keysWritable, getConfiguration()));
{code}
WritableUtils.clone calls Confgiruation.getClassByNameOrNull that has 
synchronized block. So it needs to run without locks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] hive pull request #414: HIVE-20368: Remove VectorTopNKeyOperator lock (Teddy...

2018-08-11 Thread pudidic

GitHub user pudidic opened a pull request:

https://github.com/apache/hive/pull/414

HIVE-20368: Remove VectorTopNKeyOperator lock (Teddy Choi)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pudidic/hive HIVE-20368

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #414


commit 149a613dcc94f90423ed1be2090ee87df8266a46
Author: Teddy Choi 
Date:   2018-08-11T14:29:52Z

HIVE-20368: Remove VectorTopNKeyOperator lock (Teddy Choi)




---

[jira] [Created] (HIVE-20369) TestPreUpgradeTool not run by ptest

2018-08-11 Thread Eugene Koifman (JIRA)

Eugene Koifman created HIVE-20369:
-

 Summary: TestPreUpgradeTool not run by ptest
 Key: HIVE-20369
 URL: https://issues.apache.org/jira/browse/HIVE-20369
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Eugene Koifman
Assignee: Eugene Koifman


TestPreUpgradeTool is not showing up in ptest runs
probably because upgrade-acid module is disconnected from root pom

how does standalone-metastore work?  it's also disconnected

also, hive-upgrade jar is not showing up in tar with mvn package



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 68261: HIVE-20332

2018-08-11 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68261/#review207113
---



Isn't incremental rebuild always cheaper for Project-Filter-Join MVs since they 
are always insert only? If so, we don't need cost based decision there. 
Also can you remind an  example for a MV containing aggregate where incremental 
rebuild via merge can be costlier?

- Ashutosh Chauhan


On Aug. 8, 2018, 3:39 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68261/
> ---
> 
> (Updated Aug. 8, 2018, 3:39 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20332
> https://issues.apache.org/jira/browse/HIVE-20332
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20332
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 5bdcac88d0015d2410da050524e6697a22d83eb9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
>  635d27e723dc1d260574723296f3484c26106a9c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveMaterializedViewsRelMetadataProvider.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java
>  43f8508ffbf4ba3cc46016e1d300d6ca9c2e8ccb 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCumulativeCost.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistinctRowCount.java
>  80b939a9f65142baa149b79460b753ddf469aacf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSelectivity.java
>  575902d78de2a7f95585c23a3c2fc03b9ce89478 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSize.java
>  97097381d9619e67bcab8a268d571d2a392485b3 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
>  3bf62c535cec1e7a3eac43f0ce40879dbfc89799 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 361f150193a155d45eb64266f88eb88f0a881ad3 
>   ql/src/test/results/clientpositive/llap/materialized_view_partitioned.q.out 
> b12df11a98e55c00c8b77e8292666373f3509364 
>   ql/src/test/results/clientpositive/llap/materialized_view_rebuild.q.out 
> 4d37d82b6e1f3d4ab8b76c391fa94176356093c2 
> 
> 
> Diff: https://reviews.apache.org/r/68261/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Re: Review Request 68261: HIVE-20332

2018-08-11 Thread Jesús Camacho Rodríguez



> On Aug. 11, 2018, 7:45 p.m., Ashutosh Chauhan wrote:
> > Isn't incremental rebuild always cheaper for Project-Filter-Join MVs since 
> > they are always insert only? If so, we don't need cost based decision 
> > there. 
> > Also can you remind an  example for a MV containing aggregate where 
> > incremental rebuild via merge can be costlier?

bq. Isn't incremental rebuild always cheaper for Project-Filter-Join MVs since 
they are always insert only?
Yes, it will always be cheaper.

bq. If so, we don't need cost based decision there. 
I just thought we preferred to make rewriting decisions cost-based instead of 
using Hep.

bq.Also can you remind an  example for a MV containing aggregate where 
incremental rebuild via merge can be costlier?
When there are many new rows and NDV for grouping columns is high: GBy does not 
reduce the number of rows and MERGE may end up doing a lot of work with OUTER 
JOIN + INSERT/UPDATE.


We can use HepPlanner for incremental rebuild (it needs a minor extension in 
Calcite and it should mostly work). Then if a rewriting is produced, 1) for 
Project-Filter-Join MVs we always use it, and 2) for 
Project-Filter-Join-Aggregate MVs make use of the heuristic.
However, note that we will still need to introduce a parameter to be able to 
tune the heuristic, right?
If that is the case, we may introduce Hep for Project-Filter-Join MVs in a 
follow-up?


- Jesús


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68261/#review207113
---


On Aug. 8, 2018, 3:39 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68261/
> ---
> 
> (Updated Aug. 8, 2018, 3:39 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20332
> https://issues.apache.org/jira/browse/HIVE-20332
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20332
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 5bdcac88d0015d2410da050524e6697a22d83eb9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
>  635d27e723dc1d260574723296f3484c26106a9c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveMaterializedViewsRelMetadataProvider.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java
>  43f8508ffbf4ba3cc46016e1d300d6ca9c2e8ccb 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCumulativeCost.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistinctRowCount.java
>  80b939a9f65142baa149b79460b753ddf469aacf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSelectivity.java
>  575902d78de2a7f95585c23a3c2fc03b9ce89478 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSize.java
>  97097381d9619e67bcab8a268d571d2a392485b3 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
>  3bf62c535cec1e7a3eac43f0ce40879dbfc89799 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 361f150193a155d45eb64266f88eb88f0a881ad3 
>   ql/src/test/results/clientpositive/llap/materialized_view_partitioned.q.out 
> b12df11a98e55c00c8b77e8292666373f3509364 
>   ql/src/test/results/clientpositive/llap/materialized_view_rebuild.q.out 
> 4d37d82b6e1f3d4ab8b76c391fa94176356093c2 
> 
> 
> Diff: https://reviews.apache.org/r/68261/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Re: Review Request 68261: HIVE-20332

2018-08-11 Thread Ashutosh Chauhan



> On Aug. 11, 2018, 7:45 p.m., Ashutosh Chauhan wrote:
> > Isn't incremental rebuild always cheaper for Project-Filter-Join MVs since 
> > they are always insert only? If so, we don't need cost based decision 
> > there. 
> > Also can you remind an  example for a MV containing aggregate where 
> > incremental rebuild via merge can be costlier?
> 
> Jesús Camacho Rodríguez wrote:
> bq. Isn't incremental rebuild always cheaper for Project-Filter-Join MVs 
> since they are always insert only?
> Yes, it will always be cheaper.
> 
> bq. If so, we don't need cost based decision there. 
> I just thought we preferred to make rewriting decisions cost-based 
> instead of using Hep.
> 
> bq.Also can you remind an  example for a MV containing aggregate where 
> incremental rebuild via merge can be costlier?
> When there are many new rows and NDV for grouping columns is high: GBy 
> does not reduce the number of rows and MERGE may end up doing a lot of work 
> with OUTER JOIN + INSERT/UPDATE.
> 
> 
> We can use HepPlanner for incremental rebuild (it needs a minor extension 
> in Calcite and it should mostly work). Then if a rewriting is produced, 1) 
> for Project-Filter-Join MVs we always use it, and 2) for 
> Project-Filter-Join-Aggregate MVs make use of the heuristic.
> However, note that we will still need to introduce a parameter to be able 
> to tune the heuristic, right?
> If that is the case, we may introduce Hep for Project-Filter-Join MVs in 
> a follow-up?

>From changes in q.out looks like before this patch rewriting wasn't trigerred 
>even for PFJ cases. Why would that be the case? In those cases there are 2 
>candidate plans: one for full rebuild + onverwrite  and another for full build 
>with additional predicate on writeId + insert into. This Second plan should be 
>cheaper because of additional predicates. Why didn't we pick that before this 
>patch?


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68261/#review207113
---


On Aug. 8, 2018, 3:39 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68261/
> ---
> 
> (Updated Aug. 8, 2018, 3:39 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-20332
> https://issues.apache.org/jira/browse/HIVE-20332
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20332
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 
> 5bdcac88d0015d2410da050524e6697a22d83eb9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveDefaultRelMetadataProvider.java
>  635d27e723dc1d260574723296f3484c26106a9c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveMaterializedViewsRelMetadataProvider.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java
>  43f8508ffbf4ba3cc46016e1d300d6ca9c2e8ccb 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdCumulativeCost.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdDistinctRowCount.java
>  80b939a9f65142baa149b79460b753ddf469aacf 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSelectivity.java
>  575902d78de2a7f95585c23a3c2fc03b9ce89478 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdSize.java
>  97097381d9619e67bcab8a268d571d2a392485b3 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdUniqueKeys.java
>  3bf62c535cec1e7a3eac43f0ce40879dbfc89799 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 
> 361f150193a155d45eb64266f88eb88f0a881ad3 
>   ql/src/test/results/clientpositive/llap/materialized_view_partitioned.q.out 
> b12df11a98e55c00c8b77e8292666373f3509364 
>   ql/src/test/results/clientpositive/llap/materialized_view_rebuild.q.out 
> 4d37d82b6e1f3d4ab8b76c391fa94176356093c2 
> 
> 
> Diff: https://reviews.apache.org/r/68261/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jesús Camacho Rodríguez
> 
>

Re: [ANNOUNCE] New PMC Member : Vihang Karajgaonkar

2018-08-11 Thread Lefty Leverenz

Congratulations Vihang!

-- Lefty


On Wed, Aug 1, 2018 at 3:05 PM Vaibhav Gumashta 
wrote:

> Congrats Vihang!
>
> On 8/1/18, 11:33 AM, "Chaoyu Tang"  wrote:
>
> Congratulations Vihang.
>
> On Tue, Jul 31, 2018 at 7:39 AM, Rajesh Balamohan <
> rbalamo...@apache.org>
> wrote:
>
> > Congratulations Vihang!
> >
> > ~Rajesh.B
> >
> >
> > On Tue, Jul 31, 2018 at 3:35 PM Marta Kuczora
> > 
> > wrote:
> >
> > > Congratulations Vihang!
> > >
> > > On Mon, Jul 30, 2018 at 9:44 AM Peter Vary
> 
> > > wrote:
> > >
> > > > Congratulations Vihang!
> > > >
> > > > > On Jul 29, 2018, at 22:32, Vineet Garg 
> > wrote:
> > > > >
> > > > > Congratulations Vihang!
> > > > >
> > > > >> On Jul 26, 2018, at 11:27 AM, Ashutosh Chauhan <
> > hashut...@apache.org>
> > > > wrote:
> > > > >>
> > > > >> On behalf of the Hive PMC I am delighted to announce Vihang
> > > > Karajgaonkar
> > > > >> is joining Hive PMC.
> > > > >> Thanks Vihang for all your contributions till now. Looking
> forward
> > to
> > > > many
> > > > >> more.
> > > > >>
> > > > >> Welcome, Vihang!
> > > > >>
> > > > >> Thanks,
> > > > >> Ashutosh
> > > > >
> > > >
> > > >
> >
>
>
>

Re: [ANNOUNCE] New PMC Member : Peter Vary

2018-08-11 Thread Lefty Leverenz

Congratulations Peter!

-- Lefty


On Wed, Aug 1, 2018 at 3:05 PM Vaibhav Gumashta 
wrote:

> Congrats Peter!
>
> On 8/1/18, 11:31 AM, "Chaoyu Tang"  wrote:
>
> Congratulations, Peter.
>
> On Wed, Aug 1, 2018 at 2:08 PM, Peter Vary  >
> wrote:
>
> > Thanks everyone!
> >
> > Rajesh Balamohan  ezt írta (időpont: 2018.
> júl.
> > 31.,
> > Ke 13:41):
> >
> > > Congratulations Peter!
> > >
> > > ~Rajesh.B
> > >
> > >
> > > On Tue, Jul 31, 2018 at 3:58 PM Marta Kuczora
> > > 
> > > wrote:
> > >
> > > > Congratulations Peter!
> > > >
> > > > On Mon, Jul 30, 2018 at 7:53 PM Andrew Sherman
> > > >  wrote:
> > > >
> > > > > Congratulations Peter!
> > > > >
> > > > > On Sun, Jul 29, 2018 at 1:32 PM Vineet Garg <
> vg...@hortonworks.com>
> > > > wrote:
> > > > >
> > > > > > Congratulations Peter!
> > > > > >
> > > > > > > On Jul 26, 2018, at 11:25 AM, Ashutosh Chauhan <
> > > hashut...@apache.org
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > On behalf of the Hive PMC I am delighted to announce Peter
> Vary
> > is
> > > > > > joining
> > > > > > > Hive PMC.
> > > > > > > Thanks Peter for all your contributions till now. Looking
> forward
> > > to
> > > > > many
> > > > > > > more.
> > > > > > >
> > > > > > > Welcome, Peter!
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Ashutosh
> > > > > >
> > > > > >
> > > > >
> > >
> >
>
>
>

Re: [ANNOUNCE] New PMC Member : Sahil Takiar

2018-08-11 Thread Lefty Leverenz

Congratulations Sahil!

-- Lefty


On Wed, Aug 1, 2018 at 3:05 PM Vaibhav Gumashta 
wrote:

> Congrats Sahil!
>
> On 8/1/18, 11:32 AM, "Chaoyu Tang"  wrote:
>
> Congratulations Sahil!
>
> On Tue, Jul 31, 2018 at 7:40 AM, Rajesh Balamohan <
> rbalamo...@apache.org>
> wrote:
>
> > Congratulations Sahil!
> >
> > ~Rajesh.B
> >
> >
> > On Tue, Jul 31, 2018 at 3:57 PM Marta Kuczora
> > 
> > wrote:
> >
> > > Congratulations Sahil!
> > >
> > > On Mon, Jul 30, 2018 at 9:44 AM Peter Vary
> 
> > > wrote:
> > >
> > > > Congratulations Sahil!
> > > >
> > > > > On Jul 29, 2018, at 22:32, Vineet Garg 
> > wrote:
> > > > >
> > > > > Congratulations Sahil!
> > > > >
> > > > >> On Jul 26, 2018, at 11:28 AM, Ashutosh Chauhan <
> > hashut...@apache.org>
> > > > wrote:
> > > > >>
> > > > >> On behalf of the Hive PMC I am delighted to announce Sahil
> Takiar is
> > > > >> joining Hive PMC.
> > > > >> Thanks Sahil for all your contributions till now. Looking
> forward to
> > > > many
> > > > >> more.
> > > > >>
> > > > >> Welcome, Sahil!
> > > > >>
> > > > >> Thanks,
> > > > >> Ashutosh
> > > > >
> > > >
> > > >
> >
>
>
>

Re: [ANNOUNCE] New PMC Member : Vineet Garg

2018-08-11 Thread Lefty Leverenz

Congratulations Vineet!

-- Lefty


On Wed, Aug 1, 2018 at 3:05 PM Vaibhav Gumashta 
wrote:

> Congrats Vineet!
>
> On 8/1/18, 11:33 AM, "Chaoyu Tang"  wrote:
>
> Congratulations Vineet!
>
> On Tue, Jul 31, 2018 at 7:39 AM, Rajesh Balamohan <
> rajesh.balamo...@gmail.com> wrote:
>
> > Congratulations Vineet!
> >
> > ~Rajesh.B
> >
> >
> > On Tue, Jul 31, 2018 at 3:34 PM Marta Kuczora
> > 
> > wrote:
> >
> > > Congratulations Vineet!
> > >
> > > On Mon, Jul 30, 2018 at 9:45 AM Peter Vary
> 
> > > wrote:
> > >
> > > > Congratulations Vineet!
> > > >
> > > > > On Jul 30, 2018, at 01:59, Ashutosh Chauhan <
> hashut...@apache.org>
> > > > wrote:
> > > > >
> > > > > On behalf of the Hive PMC I am delighted to announce Vineet
> Garg is
> > > > joining
> > > > > Hive PMC.
> > > > > Thanks Vineet for all your contributions till now. Looking
> forward to
> > > > many
> > > > > more.
> > > > >
> > > > > Welcome, Vineet!
> > > > >
> > > > > Thanks,
> > > > > Ashutosh
> > > >
> > > >
> > >
> >
> >
> > --
> > ~Rajesh.B
> >
>
>
>

Re: [ANNOUNCE] New committer: Slim Bouguerra

2018-08-11 Thread Lefty Leverenz

Congratulations Slim!

-- Lefty


On Tue, Jul 31, 2018 at 7:40 AM Rajesh Balamohan 
wrote:

> Congratulations Slim!
>
> ~Rajesh.B
>
> On Tue, Jul 31, 2018 at 3:34 PM Marta Kuczora
> 
> wrote:
>
> > Congratulations Slim!
> >
> > On Mon, Jul 30, 2018 at 2:01 AM Ashutosh Chauhan 
> > wrote:
> >
> > > Apache Hive's Project Management Committee (PMC) has invited Slim
> > Bouguerra
> > > to become a committer, and we are pleased to announce that he has
> > accepted.
> > >
> > > Slim, welcome, thank you for your contributions, and we look forward
> your
> > > further interactions with the community!
> > >
> > > Ashutosh Chauhan (on behalf of the Apache Hive PMC)
> > >
>

Re: Review Request 68261: HIVE-20332

2018-08-11 Thread Jesús Camacho Rodríguez



> On Aug. 11, 2018, 7:45 p.m., Ashutosh Chauhan wrote:
> > Isn't incremental rebuild always cheaper for Project-Filter-Join MVs since 
> > they are always insert only? If so, we don't need cost based decision 
> > there. 
> > Also can you remind an  example for a MV containing aggregate where 
> > incremental rebuild via merge can be costlier?
> 
> Jesús Camacho Rodríguez wrote:
> bq. Isn't incremental rebuild always cheaper for Project-Filter-Join MVs 
> since they are always insert only?
> Yes, it will always be cheaper.
> 
> bq. If so, we don't need cost based decision there. 
> I just thought we preferred to make rewriting decisions cost-based 
> instead of using Hep.
> 
> bq.Also can you remind an  example for a MV containing aggregate where 
> incremental rebuild via merge can be costlier?
> When there are many new rows and NDV for grouping columns is high: GBy 
> does not reduce the number of rows and MERGE may end up doing a lot of work 
> with OUTER JOIN + INSERT/UPDATE.
> 
> 
> We can use HepPlanner for incremental rebuild (it needs a minor extension 
> in Calcite and it should mostly work). Then if a rewriting is produced, 1) 
> for Project-Filter-Join MVs we always use it, and 2) for 
> Project-Filter-Join-Aggregate MVs make use of the heuristic.
> However, note that we will still need to introduce a parameter to be able 
> to tune the heuristic, right?
> If that is the case, we may introduce Hep for Project-Filter-Join MVs in 
> a follow-up?
> 
> Ashutosh Chauhan wrote:
> From changes in q.out looks like before this patch rewriting wasn't 
> trigerred even for PFJ cases. Why would that be the case? In those cases 
> there are 2 candidate plans: one for full rebuild + onverwrite  and another 
> for full build with additional predicate on writeId + insert into. This 
> Second plan should be cheaper because of additional predicates. Why didn't we 
> pick that before this patch?

The incremental rebuild works in two steps: 1) produce the partial rewriting 
using the MV, and 2) transform rewriting into INSERT/MERGE depending on whether 
the MV constains Aggregate or not. The costing is done over the partial 
rewriting. That is Union(MV contents, PFJ of new data), and in the case of 
containing Aggregate it is Agg(Union(MV contents, PFJA of new data)).

The cost of the union input using the MV is already reduced using heuristics 
(we favour plans containing materialized views). However, the other input to 
the union is cost as usual. In both cases (with and without Aggregate), we may 
end up overestimating number of rows coming through that input. If we estimate 
Filter condition over ROWID almost did not reduce input number of rows, then it 
is easy to estimate that the Union rewriting will be more expensive as new 
operators in the tree (e.g. additional Project to remove that ROWID column or 
separate Filter operator for ROWID) will add to the total cost because they 
need to process those rows.

Without this patch, here are the two plans for the simple mv that you mentioned 
(ignore cpu cost as that is only taken into account in case of draw for the 
cardinality):
- FULL REBUILD: 
HiveProject(key=[$0], value=[$1])
  HiveFilter(subset=[rel#2044:Subset#1.HIVE.[]], 
condition=[AND(>(CAST($0):DOUBLE, 200), <(CAST($0):DOUBLE, 250))])
HiveTableScan(subset=[rel#2042:Subset#0.HIVE.[]], table=[[default, 
src_txn]], table:alias=[src_txn])
Total cost: {751.5 rows, 1253.5 cpu, 0.0 io}

- PARTIAL REWRITING (INC REBUILD):
HiveUnion(all=[true])
  HiveProject(subset=[rel#2071:Subset#6.HIVE.[]], key=[$0], value=[$1])
HiveFilter(subset=[rel#2069:Subset#5.HIVE.[]], 
condition=[AND(>(CAST($0):DOUBLE, 200), <(CAST($0):DOUBLE, 250))])
  HiveFilter(subset=[rel#2067:Subset#4.HIVE.[]], condition=[<(1, 
$4.writeid)])
HiveTableScan(subset=[rel#2042:Subset#0.HIVE.[]], table=[[default, 
src_txn]], table:alias=[src_txn])
  HiveProject(subset=[rel#2074:Subset#8.HIVE.[]], key=[$1], value=[$0])
HiveTableScan(subset=[rel#2072:Subset#7.HIVE.[]], table=[[default, 
partition_mv_1]], table:alias=[default.partition_mv_1])
Total cost: {876.752276249 rows, 1378.75283625 cpu, 0.0 io}

(Btw, I can enable FilterMerge rule in the same loop as the MV rewriting, but 
that will still not change outcome in many cases -Project for ROWID will still 
add overhead- and will add to the optimization time).


- Jesús


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68261/#review207113
---


On Aug. 8, 2018, 3:39 p.m., Jesús Camacho Rodríguez wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68261/
> ---
> 
> (Updated Aug. 8, 2018, 3:39 p.m.)
>

Re: Review Request 68261: HIVE-20332

2018-08-11 Thread Jesús Camacho Rodríguez



> On Aug. 11, 2018, 7:45 p.m., Ashutosh Chauhan wrote:
> > Isn't incremental rebuild always cheaper for Project-Filter-Join MVs since 
> > they are always insert only? If so, we don't need cost based decision 
> > there. 
> > Also can you remind an  example for a MV containing aggregate where 
> > incremental rebuild via merge can be costlier?
> 
> Jesús Camacho Rodríguez wrote:
> bq. Isn't incremental rebuild always cheaper for Project-Filter-Join MVs 
> since they are always insert only?
> Yes, it will always be cheaper.
> 
> bq. If so, we don't need cost based decision there. 
> I just thought we preferred to make rewriting decisions cost-based 
> instead of using Hep.
> 
> bq.Also can you remind an  example for a MV containing aggregate where 
> incremental rebuild via merge can be costlier?
> When there are many new rows and NDV for grouping columns is high: GBy 
> does not reduce the number of rows and MERGE may end up doing a lot of work 
> with OUTER JOIN + INSERT/UPDATE.
> 
> 
> We can use HepPlanner for incremental rebuild (it needs a minor extension 
> in Calcite and it should mostly work). Then if a rewriting is produced, 1) 
> for Project-Filter-Join MVs we always use it, and 2) for 
> Project-Filter-Join-Aggregate MVs make use of the heuristic.
> However, note that we will still need to introduce a parameter to be able 
> to tune the heuristic, right?
> If that is the case, we may introduce Hep for Project-Filter-Join MVs in 
> a follow-up?
> 
> Ashutosh Chauhan wrote:
> From changes in q.out looks like before this patch rewriting wasn't 
> trigerred even for PFJ cases. Why would that be the case? In those cases 
> there are 2 candidate plans: one for full rebuild + onverwrite  and another 
> for full build with additional predicate on writeId + insert into. This 
> Second plan should be cheaper because of additional predicates. Why didn't we 
> pick that before this patch?
> 
> Jesús Camacho Rodríguez wrote:
> The incremental rebuild works in two steps: 1) produce the partial 
> rewriting using the MV, and 2) transform rewriting into INSERT/MERGE 
> depending on whether the MV constains Aggregate or not. The costing is done 
> over the partial rewriting. That is Union(MV contents, PFJ of new data), and 
> in the case of containing Aggregate it is Agg(Union(MV contents, PFJA of new 
> data)).
> 
> The cost of the union input using the MV is already reduced using 
> heuristics (we favour plans containing materialized views). However, the 
> other input to the union is cost as usual. In both cases (with and without 
> Aggregate), we may end up overestimating number of rows coming through that 
> input. If we estimate Filter condition over ROWID almost did not reduce input 
> number of rows, then it is easy to estimate that the Union rewriting will be 
> more expensive as new operators in the tree (e.g. additional Project to 
> remove that ROWID column or separate Filter operator for ROWID) will add to 
> the total cost because they need to process those rows.
> 
> Without this patch, here are the two plans for the simple mv that you 
> mentioned (ignore cpu cost as that is only taken into account in case of draw 
> for the cardinality):
> - FULL REBUILD: 
> HiveProject(key=[$0], value=[$1])
>   HiveFilter(subset=[rel#2044:Subset#1.HIVE.[]], 
> condition=[AND(>(CAST($0):DOUBLE, 200), <(CAST($0):DOUBLE, 250))])
> HiveTableScan(subset=[rel#2042:Subset#0.HIVE.[]], table=[[default, 
> src_txn]], table:alias=[src_txn])
> Total cost: {751.5 rows, 1253.5 cpu, 0.0 io}
> 
> - PARTIAL REWRITING (INC REBUILD):
> HiveUnion(all=[true])
>   HiveProject(subset=[rel#2071:Subset#6.HIVE.[]], key=[$0], value=[$1])
> HiveFilter(subset=[rel#2069:Subset#5.HIVE.[]], 
> condition=[AND(>(CAST($0):DOUBLE, 200), <(CAST($0):DOUBLE, 250))])
>   HiveFilter(subset=[rel#2067:Subset#4.HIVE.[]], condition=[<(1, 
> $4.writeid)])
> HiveTableScan(subset=[rel#2042:Subset#0.HIVE.[]], 
> table=[[default, src_txn]], table:alias=[src_txn])
>   HiveProject(subset=[rel#2074:Subset#8.HIVE.[]], key=[$1], value=[$0])
> HiveTableScan(subset=[rel#2072:Subset#7.HIVE.[]], table=[[default, 
> partition_mv_1]], table:alias=[default.partition_mv_1])
> Total cost: {876.752276249 rows, 1378.75283625 cpu, 0.0 io}
> 
> (Btw, I can enable FilterMerge rule in the same loop as the MV rewriting, 
> but that will still not change outcome in many cases -Project for ROWID will 
> still add overhead- and will add to the optimization time).

The second one (it was reformatted):

HiveUnion(all=[true])
  HiveProject(subset=[rel#2071:Subset#6.HIVE.[]], key=[$0], value=[$1])
HiveFilter(subset=[rel#2069:Subset#5.HIVE.[]], 
condition=[AND(>(CAST($0):DOUBLE, 200), <(CAST($0):DOUBLE, 250))])
  HiveFilter(subset=[rel#2067:Subset#4.HIVE.[]], condition=[<(1, 
$4.writeid)])
HiveTableSca

[jira] [Created] (HIVE-20367) Vectorization: Support streaming for PTF AVG, MAX, MIN, SUM

[jira] [Created] (HIVE-20368) Remove VectorTopNKeyOperator lock

[GitHub] hive pull request #414: HIVE-20368: Remove VectorTopNKeyOperator lock (Teddy...

[jira] [Created] (HIVE-20369) TestPreUpgradeTool not run by ptest

Re: Review Request 68261: HIVE-20332

Re: Review Request 68261: HIVE-20332

Re: Review Request 68261: HIVE-20332

Re: [ANNOUNCE] New PMC Member : Vihang Karajgaonkar

Re: [ANNOUNCE] New PMC Member : Peter Vary

Re: [ANNOUNCE] New PMC Member : Sahil Takiar

Re: [ANNOUNCE] New PMC Member : Vineet Garg

Re: [ANNOUNCE] New committer: Slim Bouguerra

Re: Review Request 68261: HIVE-20332

Re: Review Request 68261: HIVE-20332

14 matches

Site Navigation

Mail list logo

Footer information