[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183131#comment-17183131 ] Krisztian Kasa commented on HIVE-23723: --- Pushed to master, thanks [~amagyar]! > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142863#comment-17142863 ] Attila Magyar commented on HIVE-23723: -- [~jcamachorodriguez], ??Concerning your patch, it seems you are removing the original limit on top of the left outer join? Note that you cannot remove it : If you have 5 input rows on the left side, you know the LOJ will produce at least 5 rows, however you cannot guarantee the join will produce 5 rows at most.?? Got it, that should be kept indeed. However reason why additional reducers are introduced by the limittranspose implementation is not fully clear to me. Do you think we should drop this patch as it's already implemented by the limittranspose, and focus on tweaking the existing implementation? > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23723.1.patch > > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141631#comment-17141631 ] Jesus Camacho Rodriguez commented on HIVE-23723: Iirc it was disabled by default because the rule pushes an exact limit, which means that it may result on introducing reducers throughout in the plan, which could result in additional stages (as you see in the plan above). Thus, it was only triggered via cost-based decision because if we were not filtering much data, it could result in regressions. Till we could explore this further and tune the cost-model, we decided to leave it disabled by default. Fwiw note that the rule can also push limit through other operators, e.g., union. It would be great if we could enable the rule, identify the additionally created {{limit}} operators with a {{topn}} label, and pass the top-n information via hint to the Hive physical plan generation logic; this would also open a path to implement a way to being able to create {{topNKey}} operators from the SQL statement, as [~gopalv] suggested at some point. However, I understand this may be out of the scope of this patch. Concerning your patch, it seems you are removing the original limit on top of the left outer join? Note that you cannot remove it : If you have 5 input rows on the left side, you know the LOJ will produce at least 5 rows, however you cannot guarantee how many you will produce at most. The {{Fetch Operator}} with limit is guaranteeing you get at most 5 rows, but since the match on the rule is a {{Limit}} operator, it could be anywhere in the plan, e.g., if CBO pushes limit operators through other operators. > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23723.1.patch > > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140588#comment-17140588 ] Attila Magyar commented on HIVE-23723: -- [~jcamachorodriguez], thanks for letting me know, I haven't realized it. Any idea why it is disabled by default? Also the plan looks different with limittranspose, not sure why. There are 3 Limit operators. The first one is what was pushed through the LOJ. But there are 2 others in Reducer 2. {code:java} explain SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key = src2.key) LIMIT 5; {code} {code:java} PREHOOK: query: explain SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key = src2.key) LIMIT 5 PREHOOK: type: QUERY PREHOOK: Input: default@src A masked pattern was here POSTHOOK: query: explain SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key = src2.key) LIMIT 5 POSTHOOK: type: QUERY POSTHOOK: Input: default@src A masked pattern was here STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1STAGE PLANS: Stage: Stage-1 Tez A masked pattern was here Edges: Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) Reducer 3 <- Map 4 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE) A masked pattern was here Vertices: Map 1 Map Operator Tree: TableScan alias: src1 Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: string) outputColumnNames: _col0 Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE Limit Number of rows: 5 Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator null sort order: sort order: Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE TopN Hash Memory Usage: 0.3 value expressions: _col0 (type: string) Execution mode: vectorized, llap LLAP IO: no inputs Map 4 Map Operator Tree: TableScan alias: src2 filterExpr: key is not null (type: boolean) Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: string), value (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) null sort order: z sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string) Execution mode: vectorized, llap LLAP IO: no inputs Reducer 2 Execution mode: vectorized, llap Reduce Operator Tree: Limit Number of rows: 5 Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: VALUE._col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE Limit Number of rows: 5 Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) null sort order: z sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE Reducer 3 Execution mode: llap Reduce Operator Tree: Merge Join Operator condition map: Left Outer Join 0 to 1 keys:
[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139971#comment-17139971 ] Jesus Camacho Rodriguez commented on HIVE-23723: [~amagyar], this is available through CBO ({{hive.optimize.limittranspose}}). Should we enable that rule instead of implementing a new transformation rule on the Hive side? > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23723.1.patch > > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)