[GitHub] cloud-fan commented on a change in pull request #23303: [SPARK-26352][SQL] join reorder should not change the order of output attributes

GitBox Sat, 15 Dec 2018 17:28:27 -0800

cloud-fan commented on a change in pull request #23303: [SPARK-26352][SQL] join 
reorder should not change the order of output attributes
URL: https://github.com/apache/spark/pull/23303#discussion_r241965281


 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ##########
 @@ -403,10 +404,54 @@ object RemoveRedundantAliases extends Rule[LogicalPlan] {
 
 /**
  * Remove projections from the query plan that do not make any modifications.
+ * It handles top-level and intermediate [[Project]]s differently:
+ *  - Top-level:
+ *      A [[Project]] is only considered redundant if its output attributes 
are exactly the same as
+ *      its child, include the order of attributes.
+ *               This affects how the outside world perceives this query plan.
+ *  - Intermediate (not top-leve):
+ *      A [[Project]] is redundant as long as its outputSet is the same as the 
child's. It won't
+ *      affect the outer appearance so we're free to change the order of the 
output attributes.
+ *      We should, however, retain the [[Project]]s that have a shorter output 
attribute list than
+ *      the child's. That can reduce the materialized data size so it's worth 
keeping.
  */
 object RemoveRedundantProject extends Rule[LogicalPlan] {
 
 Review comment:
   This is too risky, are there other ways to work around it? Or can we accept 
sub-optimal plans?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] cloud-fan commented on a change in pull request #23303: [SPARK-26352][SQL] join reorder should not change the order of output attributes

Reply via email to