[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ

2020-08-24 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183131#comment-17183131
 ] 

Krisztian Kasa commented on HIVE-23723:
---

Pushed to master, thanks [~amagyar]!

> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-23 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142863#comment-17142863
 ] 

Attila Magyar commented on HIVE-23723:
--

[~jcamachorodriguez],

??Concerning your patch, it seems you are removing the original limit on top of 
the left outer join? Note that you cannot remove it : If you have 5 input rows 
on the left side, you know the LOJ will produce at least 5 rows, however you 
cannot guarantee the join will produce 5 rows at most.??

 

Got it, that should be kept indeed. However reason why additional reducers are 
introduced by the limittranspose implementation is not fully clear to me.

Do you think we should drop this patch as it's already implemented by the 
limittranspose, and focus on tweaking the existing implementation?

> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23723.1.patch
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-21 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141631#comment-17141631
 ] 

Jesus Camacho Rodriguez commented on HIVE-23723:


Iirc it was disabled by default because the rule pushes an exact limit, which 
means that it may result on introducing reducers throughout in the plan, which 
could result in additional stages (as you see in the plan above). Thus, it was 
only triggered via cost-based decision because if we were not filtering much 
data, it could result in regressions. Till we could explore this further and 
tune the cost-model, we decided to leave it disabled by default. Fwiw note that 
the rule can also push limit through other operators, e.g., union.
It would be great if we could enable the rule, identify the additionally 
created {{limit}} operators with a {{topn}} label, and pass the top-n 
information via hint to the Hive physical plan generation logic; this would 
also open a path to implement a way to being able to create {{topNKey}} 
operators from the SQL statement, as [~gopalv] suggested at some point. 
However, I understand this may be out of the scope of this patch.

Concerning your patch, it seems you are removing the original limit on top of 
the left outer join? Note that you cannot remove it : If you have 5 input rows 
on the left side, you know the LOJ will produce at least 5 rows, however you 
cannot guarantee how many you will produce at most. The {{Fetch Operator}} with 
limit is guaranteeing you get at most 5 rows, but since the match on the rule 
is a {{Limit}} operator, it could be anywhere in the plan, e.g., if CBO pushes 
limit operators through other operators.

> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23723.1.patch
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-19 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140588#comment-17140588
 ] 

Attila Magyar commented on HIVE-23723:
--

[~jcamachorodriguez], thanks for letting me know, I haven't realized it. Any 
idea why it is disabled by default? 

Also the plan looks different with limittranspose, not sure why. There are 3 
Limit operators. The first one is what was pushed through the LOJ. But there 
are 2 others in Reducer 2.

 
{code:java}
explain
SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key 
= src2.key) LIMIT 5; {code}
{code:java}
PREHOOK: query: explain
SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key 
= src2.key) LIMIT 5
PREHOOK: type: QUERY
PREHOOK: Input: default@src
 A masked pattern was here 
POSTHOOK: query: explain
SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key 
= src2.key) LIMIT 5
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
 A masked pattern was here 
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1STAGE PLANS:
  Stage: Stage-1
Tez
 A masked pattern was here 
  Edges:
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
Reducer 3 <- Map 4 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE)
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: src1
  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
  Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
Limit
  Number of rows: 5
  Statistics: Num rows: 5 Data size: 435 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
null sort order: 
sort order: 
Statistics: Num rows: 5 Data size: 435 Basic stats: 
COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.3
value expressions: _col0 (type: string)
Execution mode: vectorized, llap
LLAP IO: no inputs
Map 4 
Map Operator Tree:
TableScan
  alias: src2
  filterExpr: key is not null (type: boolean)
  Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
Select Operator
  expressions: key (type: string), value (type: string)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string)
Execution mode: vectorized, llap
LLAP IO: no inputs
Reducer 2 
Execution mode: vectorized, llap
Reduce Operator Tree:
  Limit
Number of rows: 5
Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE 
Column stats: COMPLETE
Select Operator
  expressions: VALUE._col0 (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE 
Column stats: COMPLETE
  Limit
Number of rows: 5
Statistics: Num rows: 5 Data size: 435 Basic stats: 
COMPLETE Column stats: COMPLETE
Reduce Output Operator
  key expressions: _col0 (type: string)
  null sort order: z
  sort order: +
  Map-reduce partition columns: _col0 (type: string)
  Statistics: Num rows: 5 Data size: 435 Basic stats: 
COMPLETE Column stats: COMPLETE
Reducer 3 
Execution mode: llap
Reduce Operator Tree:
  Merge Join Operator
condition map:
 Left Outer Join 0 to 1
keys:
  

[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-18 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139971#comment-17139971
 ] 

Jesus Camacho Rodriguez commented on HIVE-23723:


[~amagyar], this is available through CBO ({{hive.optimize.limittranspose}}). 
Should we enable that rule instead of implementing a new transformation rule on 
the Hive side?

> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23723.1.patch
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)