Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10451#issuecomment-167678849
  
    A few updates are done, but I am not sure if the changes are appropriate. 
    
    1. `limit` is already used in 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L251.
 Thus, I have to choose a different function name `limitedNumRows`
    2. The new changes did not resolve the issue found by @cloud-fan . "If the 
left or right is an operator that can push Limit down(currently there is no 
such operator, but we can't guarantee there won't be)." To resolve this issue, 
we are facing three options: 
      * generate the value of `limitedNumRows` for each logicalPlan operators 
(except LeafNode) based on their children's `limitedNumRows`. However, we are 
unable to do that for some operators (e.g., `Join` and `Generate`) That means, 
although we can correctly get the value of `limitedNumRows`, we are unable to 
detect if the Limit has been pushed down if the child plan have these nodes.
      * write a recursive function for traversing all the child nodes based on 
their operator types. This might be over-engineered for this case. 
      * add a comment and explain the current solution. In the future, if we 
add such an operator, we can change the current code and fix the issue?
    3. The current way will not add extra `Limit` if user already added `Limit` 
in both child nodes. This is shown in the newly added test case 
https://github.com/gatorsmile/spark/blob/6998ec9d091260c63b40964997126838812bbf03/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationPushDownSuite.scala#L83-L91
    
    Please feel free let me know if I need to do any change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to