[GitHub] spark pull request: [SPARK-8964] [SQL] Use Exchange to perform shu...

JoshRosen Thu, 04 Feb 2016 18:48:43 -0800

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7334#discussion_r51971305
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -337,8 +337,12 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
             execution.Sample(lb, ub, withReplacement, seed, planLater(child)) 
:: Nil
           case logical.LocalRelation(output, data) =>
             LocalTableScan(output, data) :: Nil
    +      case logical.ReturnAnswer(logical.Limit(IntegerLiteral(limit), 
child)) =>
    +        execution.CollectLimit(limit, planLater(child)) :: Nil
           case logical.Limit(IntegerLiteral(limit), child) =>
    -        execution.Limit(limit, planLater(child)) :: Nil
    +        val perPartitionLimit = execution.LocalLimit(limit, 
planLater(child))
    --- End diff --
    
    @gatorsmile: I think we're in agreement. To recap:
    
    Before this patch, `   select * from (select * from A limit 10 ) t1 UNION 
ALL (select * from B limit 10) t2` should get planned as:
    
    ```
              âââââââââââ         
              â  Union  â         
              âââââââââââ         
                   â²              
          ââââââââââ´âââââââââ     
          â                 â     
    ââââââââââââ      
ââââââââââââ
    â Limit 10 â      â Limit 10 â
    ââââââââââââ      
ââââââââââââ
          â²                 â²     
          â                 â     
    ââââââââââââ      
ââââââââââââ
    â  Scan A  â      â  Scan B  â
    ââââââââââââ      
ââââââââââââ
    ```
    
    Afterwards, this becomes
    
    ```
                âââââââââââ             
                â  Union  â             
                âââââââââââ             
                     â²                  
            ââââââââââ´âââââââââââ     
  
            â                   â       
    âââââââââââââââââ   
ââââââââââââââââ
    âGlobalLimit 10 â   âGlobalLimit 10â
    âââââââââââââââââ   
ââââââââââââââââ
            â²                   â²       
            â                   â       
    ââââââââââââââââ    
ââââââââââââââââ
    âLocalLimit 10 â    â LocaLimit 10 â
    ââââââââââââââââ    
ââââââââââââââââ
            â²                   â²       
            â                   â       
      ââââââââââââ        
ââââââââââââ  
      â  Scan A  â        â  Scan B  â  
      ââââââââââââ        
ââââââââââââ  
    ```
    
    What is **not** legal to do here is to pull the `GlobalLimit` up, so the 
following would be wrong:
    
    ```
             âââââââââââââââââ          
             âGlobalLimit 10 â          
             âââââââââââââââââ          
                     â²                  
                     â                  
                âââââââââââ             
                â  Union  â             
                âââââââââââ             
                     â²                  
            ââââââââââ´âââââââââââ     
  
            â                   â       
    ââââââââââââââââ    
ââââââââââââââââ
    âLocalLimit 10 â    â LocaLimit 10 â
    ââââââââââââââââ    
ââââââââââââââââ
            â²                   â²       
            â                   â       
      ââââââââââââ        
ââââââââââââ  
      â  Scan A  â        â  Scan B  â  
      ââââââââââââ        
ââââââââââââ  
    ```
    
    That plan would be semantically equivalent to executing
    
    ```
      select * from (select * from A ) t1 UNION ALL (select * from B) t2 LIMIT 
10
    ```
    
    @davies, were you suggesting that the current planning is wrong? Or that we 
need more tests to guard against incorrect changes to limit planning? I don't 
believe that the changes in this patch will affect the planning of the case 
being described here, since we're not making any changes to limit pull-up or 
push-down. I _do_ have a followup patch in the works which takes @gatorsmile's 
two limit-pushdown patches and rebases them on top of the changes here: 
https://github.com/apache/spark/compare/master...JoshRosen:limit-pushdown-2. In 
that patch, I do plan to add more tests to handle these pushdown-related 
concerns.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8964] [SQL] Use Exchange to perform shu...

Reply via email to