Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/10454#discussion_r48388339
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -858,6 +859,30 @@ object PushPredicateThroughJoin extends
Rule[LogicalPlan] with PredicateHelper {
}
/**
+ * Push [[Limit]] operators through [[Join]] operators, iff the join type
is outer joins.
+ * Adding extra [[Limit]] operators on top of the outer-side
child/children.
+ */
+object PushLimitThroughOuterJoin extends Rule[LogicalPlan] with
PredicateHelper {
--- End diff --
I am not sure what is the best way to prove it. If we can add extra `limit`
below `union all`, `left outer` and `right outer`, can we add extra `limit`
below `full outer`? In the traditional RDBMS, ```full outer = union all(left
outer, right outer)```. I am not sure if Spark SQL has the same semantics.
`(A full outer join B) limit 5`
= `(A left outer join B) limit 5` `union all` `(A right outer join B) limit
5`
= `((A limit 5) left outer join (B limit 5))` `union all` `((A limit 5)
right outer join (B limit 5))`
= `((A limit 5) full outer join (B limit 5))`
However, inner join has a big issue if we are trying to add extra `limit`.
I am not sure my answer is clear.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]