Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/6780#discussion_r32367193
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -154,9 +170,6 @@ object ColumnPruning extends Rule[LogicalPlan] {
Join(left, prunedChild(right, allReferences), LeftSemi, condition)
- case Project(projectList, Limit(exp, child)) =>
--- End diff --
I don't think that we want to remove this rule. Its generally good to push
down projections because we might be able to continue to push them down.
Instead I propose we do two things:
- Add a rule that pushes Project beneath Sort when possible (i.e. we
aren't projecting away the Sort attributes).
- Make the `TakeOrdered` strategy and operator more general, such that it
can optionally handle a projection when necessary. This projection can be done
on the driver after the `rdd.take(...)`. The reasoning here is that Project is
one of the few 1-1 nodes and is a very common case such that we probably want
to be able to execute `limit -> project -> sort` efficiently.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]