Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6780#discussion_r32367193
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
    @@ -154,9 +170,6 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
           Join(left, prunedChild(right, allReferences), LeftSemi, condition)
     
    -    case Project(projectList, Limit(exp, child)) =>
    --- End diff --
    
    I don't think that we want to remove this rule.  Its generally good to push 
down projections because we might be able to continue to push them down.  
Instead I propose we do two things:
     - Add a rule that pushes Project beneath Sort when possible (i.e. we 
aren't projecting away the Sort attributes).
     - Make the `TakeOrdered` strategy and operator more general, such that it 
can optionally handle a projection when necessary.  This projection can be done 
on the driver after the `rdd.take(...)`.  The reasoning here is that Project is 
one of the few 1-1 nodes and is a very common case such that we probably want 
to be able to execute `limit -> project -> sort` efficiently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to