Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/19630#discussion_r151448531
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
---
@@ -137,15 +138,18 @@ object ExtractPythonUDFs extends Rule[SparkPlan] with
PredicateHelper {
udf.references.subsetOf(child.outputSet)
}
if (validUdfs.nonEmpty) {
- if (validUdfs.exists(_.pythonUdfType ==
PythonUdfType.PANDAS_GROUPED_UDF)) {
- throw new IllegalArgumentException("Can not use grouped
vectorized UDFs")
- }
+ require(validUdfs.forall(udf =>
+ udf.evalType == PythonEvalType.SQL_BATCHED_UDF ||
+ udf.evalType == PythonEvalType.PANDAS_SCALAR_UDF
+ ), "Can only extract scalar vectorized udf or sql batch udf")
--- End diff --
I think "equal" comparison is better than "not equal", because we might add
new types.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]