Github user gatorsmile commented on the pull request:
https://github.com/apache/spark/pull/10451#issuecomment-167678849
A few updates are done, but I am not sure if the changes are appropriate.
1. `limit` is already used in
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L251.
Thus, I have to choose a different function name `limitedNumRows`
2. The new changes did not resolve the issue found by @cloud-fan . "If the
left or right is an operator that can push Limit down(currently there is no
such operator, but we can't guarantee there won't be)." To resolve this issue,
we are facing three options:
* generate the value of `limitedNumRows` for each logicalPlan operators
(except LeafNode) based on their children's `limitedNumRows`. However, we are
unable to do that for some operators (e.g., `Join` and `Generate`) That means,
although we can correctly get the value of `limitedNumRows`, we are unable to
detect if the Limit has been pushed down if the child plan have these nodes.
* write a recursive function for traversing all the child nodes based on
their operator types. This might be over-engineered for this case.
* add a comment and explain the current solution. In the future, if we
add such an operator, we can change the current code and fix the issue?
3. The current way will not add extra `Limit` if user already added `Limit`
in both child nodes. This is shown in the newly added test case
https://github.com/gatorsmile/spark/blob/6998ec9d091260c63b40964997126838812bbf03/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationPushDownSuite.scala#L83-L91
Please feel free let me know if I need to do any change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]