HyukjinKwon commented on a change in pull request #25600: [SPARK-11150][SQL]
Dynamic Partition Pruning
URL: https://github.com/apache/spark/pull/25600#discussion_r318405325
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -48,7 +48,9 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
}
override protected val blacklistedOnceBatches: Set[String] =
- Set("Extract Python UDFs")
+ Set(
+ "PartitionPruning",
Review comment:
Copied and pasted from the offline discussion with @maryannxue:
Seems the problem is that, `ExtractPythonUDFFromAggregate` seems not
idempotent.
so in `OptimizeSubqueries`, when it optimizes this plan at here
https://github.com/apache/spark/blob/bab88c48b1432249571aae90bc56b40d74f2fa88/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L240
Here, `s.plan` is as below:
```
Subquery
+- Project [pyUDF(agg#32) AS CAST(pyUDF(cast(min(k) as string)) AS
STRING)#27]
+- Aggregate [min(k#18) AS agg#32]
+- Project [_1#13 AS k#18]
+- LocalRelation [_1#13, _2#14]
```
After `CollapseProject`:
```
Subquery
+- Aggregate [pyUDF(min(k#18)) AS CAST(pyUDF(cast(min(k) as string)) AS
STRING)#27]
+- Project [_1#13 AS k#18]
+- LocalRelation [_1#13, _2#14]
```
After `ExtractPythonUDFFromAggregate`:
```
Subquery
+- Project [pyUDF(agg#33) AS CAST(pyUDF(cast(min(k) as string)) AS
STRING)#27]
+- Aggregate [min(k#18) AS agg#33]
+- Project [_1#13 AS k#18]
+- LocalRelation [_1#13, _2#14]
```
seems the new alias is made at
https://github.com/apache/spark/blob/77c7e91e029a9a70678435acb141154f2f51882e/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala#L63
So adding it to blacklisting makes sense to me for now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]