sunchao commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r624646519
##########
File path:
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
##########
@@ -23,39 +23,76 @@
/**
* Interface for a function that produces a result value for each input row.
* <p>
- * To evaluate each input row, Spark will first try to lookup and use a "magic
method" (described
- * below) through Java reflection. If the method is not found, Spark will call
- * {@link #produceResult(InternalRow)} as a fallback approach.
+ * To evaluate each input row, Spark will first try to lookup and use either a
static or
+ * non-static "magic method" (described below) through Java reflection. If
neither of the
+ * magic methods is not found, Spark will call {@link
#produceResult(InternalRow)} as a fallback
+ * approach. In other words, the precedence is as follow:
+ * <ul>
+ * <li>static magic method</li>
+ * <li>non-static magic method</li>
+ * <li>{@link #produceResult(InternalRow)}</li>
+ * </ul>
* <p>
* The JVM type of result values produced by this function must be the type
used by Spark's
* InternalRow API for the {@link DataType SQL data type} returned by {@link
#resultType()}.
+ * The mapping between {@link DataType} and the corresponding JVM type is
defined below.
* <p>
* <b>IMPORTANT</b>: the default implementation of {@link #produceResult}
throws
* {@link UnsupportedOperationException}. Users can choose to override this
method, or implement
- * a "magic method" with name {@link #MAGIC_METHOD_NAME} which takes
individual parameters
- * instead of a {@link InternalRow}. The magic method will be loaded by Spark
through Java
- * reflection and will also provide better performance in general, due to
optimizations such as
- * codegen, removal of Java boxing, etc.
- *
- * For example, a scalar UDF for adding two integers can be defined as follow
with the magic
+ * a static magic method with name {@link #STATIC_MAGIC_METHOD_NAME}, or
non-static magic
Review comment:
Turns out `zipWithIndex` in `ApplyFunctionExpression` is also quite
expensive comparing to our simple add function. After removing that and fixing
SPARK-35281, I no longer see regression when result is nullable, and the gap
between the `produceResult` and magic method becomes narrower:
```
[info] OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Mac OS X 10.16
[info] Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
[info] Java scalar function (long + long) -> long/notnull wholestage on:
Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
------------------------------------------------------------------------------------------------------------------------------------------------
[info] with long_add_default
19794 19860 113 25.3 39.6 1.0X
[info] with long_add_magic
7627 7772 228 65.6 15.3 2.6X
[info] with long_add_static_magic
6631 6725 82 75.4 13.3 3.0X
[info] OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Mac OS X 10.16
[info] Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
[info] Java scalar function (long + long) -> long/nullable wholestage on:
Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info]
-------------------------------------------------------------------------------------------------------------------------------------------------
[info] with long_add_default
19990 20243 275 25.0 40.0 1.0X
[info] with long_add_magic
7285 7435 141 68.6 14.6 2.7X
[info] with long_add_static_magic
7077 7170 145 70.6 14.2 2.8X
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]