[GitHub] [spark] cloud-fan commented on a diff in pull request #39170: [SPARK-41674][SQL] Runtime filter should supports multi level shuffle join side as filter creation side

via GitHub Mon, 10 Apr 2023 00:41:55 -0700


cloud-fan commented on code in PR #39170:
URL: https://github.com/apache/spark/pull/39170#discussion_r1161512196



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala:
##########
@@ -114,11 +114,13 @@ object InjectRuntimeFilter extends Rule[LogicalPlan] with 
PredicateHelper with J
   }
 
   /**
-   * Returns whether the plan is a simple filter over scan and the filter is 
likely selective
+   * Returns whether the plan exists a simple filter over scan and the filter 
is likely selective
    * Also check if the plan only has simple expressions (attribute reference, 
literals) so that we
    * do not add a subquery that might have an expensive computation

Review Comment:
   Let's add some more comments to introduce the theory: Runtime filters use 
one side of the join to build a set of join key values and prune the other side 
of the join. It's also OK to use a superset of the join key values to do the 
pruning. For inner joins, one side of the join always produces a superset of 
the join key values.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #39170: [SPARK-41674][SQL] Runtime filter should supports multi level shuffle join side as filter creation side

Reply via email to