[GitHub] [spark] aokolnychyi commented on a diff in pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

GitBox Tue, 08 Nov 2022 18:48:36 -0800


aokolnychyi commented on code in PR #38557:
URL: https://github.com/apache/spark/pull/38557#discussion_r1017352098



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/RowLevelOperationRuntimeGroupFiltering.scala:
##########
@@ -89,10 +88,8 @@ case class 
RowLevelOperationRuntimeGroupFiltering(optimizeSubqueries: Rule[Logic
       buildKeys: Seq[Attribute],
       pruningKeys: Seq[Attribute]): Expression = {
 
-    val buildQuery = Project(buildKeys, matchingRowsPlan)
-    val dynamicPruningSubqueries = pruningKeys.zipWithIndex.map { case (key, 
index) =>
-      DynamicPruningSubquery(key, buildQuery, buildKeys, index, 
onlyInBroadcast = false)
-    }
-    dynamicPruningSubqueries.reduce(And)
+    val buildQuery = Aggregate(buildKeys, buildKeys, matchingRowsPlan)

Review Comment:
   Got it. I was originally worried we could miss some future optimizations 
given that dynamic pruning for row-level operations would go through a 
different route compared to the normal DPP.
   
   One alternative could be to extend `DynamicPruningSubquery` with a flag 
whether it should be optimized or not. Up to you, though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38557: [SPARK-38959][SQL][FOLLOWUP] Optimizer batch `PartitionPruning` should optimize subqueries

Reply via email to