Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7334#discussion_r51971305
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -337,8 +337,12 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
             execution.Sample(lb, ub, withReplacement, seed, planLater(child)) 
:: Nil
           case logical.LocalRelation(output, data) =>
             LocalTableScan(output, data) :: Nil
    +      case logical.ReturnAnswer(logical.Limit(IntegerLiteral(limit), 
child)) =>
    +        execution.CollectLimit(limit, planLater(child)) :: Nil
           case logical.Limit(IntegerLiteral(limit), child) =>
    -        execution.Limit(limit, planLater(child)) :: Nil
    +        val perPartitionLimit = execution.LocalLimit(limit, 
planLater(child))
    --- End diff --
    
    @gatorsmile: I think we're in agreement. To recap:
    
    Before this patch, `   select * from (select * from A limit 10 ) t1 UNION 
ALL (select * from B limit 10) t2` should get planned as:
    
    ```
              ┌─────────┐         
              │  Union  │         
              └─────────┘         
                   ▲              
          ┌────────┴────────┐     
          │                 │     
    ┌──────────┐      
┌──────────┐
    │ Limit 10 │      │ Limit 10 │
    └──────────┘      
└──────────┘
          ▲                 ▲     
          │                 │     
    ┌──────────┐      
┌──────────┐
    │  Scan A  │      │  Scan B  │
    └──────────┘      
└──────────┘
    ```
    
    Afterwards, this becomes
    
    ```
                ┌─────────┐             
                │  Union  │             
                └─────────┘             
                     ▲                  
            ┌────────┴──────────┐     
  
            │                   │       
    ┌───────────────┐   
┌──────────────┐
    │GlobalLimit 10 │   │GlobalLimit 10│
    └───────────────┘   
└──────────────┘
            ▲                   ▲       
            │                   │       
    ┌──────────────┐    
┌──────────────┐
    │LocalLimit 10 │    │ LocaLimit 10 │
    └──────────────┘    
└──────────────┘
            ▲                   ▲       
            │                   │       
      ┌──────────┐        
┌──────────┐  
      │  Scan A  │        │  Scan B  │  
      └──────────┘        
└──────────┘  
    ```
    
    What is **not** legal to do here is to pull the `GlobalLimit` up, so the 
following would be wrong:
    
    ```
             ┌───────────────┐          
             │GlobalLimit 10 │          
             └───────────────┘          
                     ▲                  
                     │                  
                ┌─────────┐             
                │  Union  │             
                └─────────┘             
                     ▲                  
            ┌────────┴──────────┐     
  
            │                   │       
    ┌──────────────┐    
┌──────────────┐
    │LocalLimit 10 │    │ LocaLimit 10 │
    └──────────────┘    
└──────────────┘
            ▲                   ▲       
            │                   │       
      ┌──────────┐        
┌──────────┐  
      │  Scan A  │        │  Scan B  │  
      └──────────┘        
└──────────┘  
    ```
    
    That plan would be semantically equivalent to executing
    
    ```
      select * from (select * from A ) t1 UNION ALL (select * from B) t2 LIMIT 
10
    ```
    
    @davies, were you suggesting that the current planning is wrong? Or that we 
need more tests to guard against incorrect changes to limit planning? I don't 
believe that the changes in this patch will affect the planning of the case 
being described here, since we're not making any changes to limit pull-up or 
push-down. I _do_ have a followup patch in the works which takes @gatorsmile's 
two limit-pushdown patches and rebases them on top of the changes here: 
https://github.com/apache/spark/compare/master...JoshRosen:limit-pushdown-2. In 
that patch, I do plan to add more tests to handle these pushdown-related 
concerns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to