[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

GitBox Sun, 02 May 2021 00:23:54 -0700


sunchao commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r624646519




##########
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
##########
@@ -23,39 +23,76 @@
 /**
  * Interface for a function that produces a result value for each input row.
  * <p>
- * To evaluate each input row, Spark will first try to lookup and use a "magic 
method" (described
- * below) through Java reflection. If the method is not found, Spark will call
- * {@link #produceResult(InternalRow)} as a fallback approach.
+ * To evaluate each input row, Spark will first try to lookup and use either a 
static or
+ * non-static "magic method" (described below) through Java reflection. If 
neither of the
+ * magic methods is not found, Spark will call {@link 
#produceResult(InternalRow)} as a fallback
+ * approach. In other words, the precedence is as follow:
+ * <ul>
+ *   <li>static magic method</li>
+ *   <li>non-static magic method</li>
+ *   <li>{@link #produceResult(InternalRow)}</li>
+ * </ul>
  * <p>
  * The JVM type of result values produced by this function must be the type 
used by Spark's
  * InternalRow API for the {@link DataType SQL data type} returned by {@link 
#resultType()}.
+ * The mapping between {@link DataType} and the corresponding JVM type is 
defined below.
  * <p>
  * <b>IMPORTANT</b>: the default implementation of {@link #produceResult} 
throws
  * {@link UnsupportedOperationException}. Users can choose to override this 
method, or implement
- * a "magic method" with name {@link #MAGIC_METHOD_NAME} which takes 
individual parameters
- * instead of a {@link InternalRow}. The magic method will be loaded by Spark 
through Java
- * reflection and will also provide better performance in general, due to 
optimizations such as
- * codegen, removal of Java boxing, etc.
- *
- * For example, a scalar UDF for adding two integers can be defined as follow 
with the magic
+ * a static magic method with name {@link #STATIC_MAGIC_METHOD_NAME}, or 
non-static magic

Review comment:
       Turns out `zipWithIndex` in `ApplyFunctionExpression` is also quite 
expensive comparing to our simple add function. After removing that and fixing 
SPARK-35281, I no longer see regression when result is nullable, and the gap 
between the `produceResult` and magic method becomes narrower:
   
   ```
   [info] OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Mac OS X 10.16
   [info] Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
   [info] Java scalar function (long + long) -> long/notnull wholestage on:  
Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
------------------------------------------------------------------------------------------------------------------------------------------------
   [info] with long_add_default                                                 
    19794          19860         113         25.3          39.6       1.0X
   [info] with long_add_magic                                                   
     7627           7772         228         65.6          15.3       2.6X
   [info] with long_add_static_magic                                            
     6631           6725          82         75.4          13.3       3.0X
   
   [info] OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Mac OS X 10.16
   [info] Intel(R) Core(TM) i9-10910 CPU @ 3.60GHz
   [info] Java scalar function (long + long) -> long/nullable wholestage on:  
Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
-------------------------------------------------------------------------------------------------------------------------------------------------
   [info] with long_add_default                                                 
     19990          20243         275         25.0          40.0       1.0X
   [info] with long_add_magic                                                   
      7285           7435         141         68.6          14.6       2.7X
   [info] with long_add_static_magic                                            
      7077           7170         145         70.6          14.2       2.8X
   ```
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

Reply via email to