rednaxelafx commented on a change in pull request #23303: [SPARK-26352][SQL]
ReorderJoin should not change the order of columns
URL: https://github.com/apache/spark/pull/23303#discussion_r241366694
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -48,8 +48,18 @@ object CostBasedJoinReorder extends Rule[LogicalPlan] with
PredicateHelper {
if projectList.forall(_.isInstanceOf[Attribute]) =>
reorder(p, p.output)
}
- // After reordering is finished, convert OrderedJoin back to Join
+
+ // Cleanups
result transformDown {
+ // if a Project was created to keep output attribute order after join
reordering, but
Review comment:
This isn't really an improvement. It's here mainly to help pass existing
tests -- I'm adding new projections to fix the output attribute order problem,
but these extra projections in the middle would make (`expected`) query plans
look pretty ugly.
So consider this is two things that can cancel each other out:
1. Add projections to fix output attribute order;
2. If one of these extra projections is in the middle, get rid of it.
The cleanup (2) is only meant to clean extra projections created because of
(1). Both (1) and (2) are in this PR so I don't consider this as a performance
improvement over existing stuff.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]