[GitHub] [spark] peter-toth commented on a diff in pull request #37525: [WIP][SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into account

GitBox Sun, 15 Jan 2023 12:36:00 -0800


peter-toth commented on code in PR #37525:
URL: https://github.com/apache/spark/pull/37525#discussion_r1070665499



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala:
##########
@@ -70,53 +66,16 @@ trait AliasAwareOutputExpression extends SQLConfHelper {
   protected def normalizeExpression(
       expr: Expression,
       pruneFunc: (Expression, AttributeSet) => Option[Expression]): 
Seq[Expression] = {
-    val normalizedCandidates = new mutable.HashSet[Expression]()
-    normalizedCandidates.add(expr)
     val outputSet = AttributeSet(outputExpressions.map(_.toAttribute))
-
-    def pruneCandidate(candidate: Expression): Option[Expression] = {
+    expr.multiTransform {

Review Comment:
   @cloud-fan, @ulysses-you I've updated this PR. Now it is based on 
`multiTransform` and contains changes from both this PR and 
https://github.com/apache/spark/pull/39556 (see the description).
   
   `normalizeExpression()` becomes as simple as this with `multiTransform`.
   
   Please note that currently `pruneFunc` is used only for "after 
transformation filtering", but, as `multiTransform` does the mapping in "one 
run" (unlike the removed code which runs a `transform` for each alias) so it is 
much more efficient than the removed version if we have high number of aliases.
   
   Some early pruning would also be possible using `multiTransform`, I will 
show you that version a bit later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] peter-toth commented on a diff in pull request #37525: [WIP][SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into account

Reply via email to