[GitHub] [spark] sunchao commented on a change in pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation

GitBox Thu, 15 Apr 2021 09:53:51 -0700


sunchao commented on a change in pull request #32082:
URL: https://github.com/apache/spark/pull/32082#discussion_r614238811




##########
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
##########
@@ -30,10 +30,71 @@
  * <p>
  * The JVM type of result values produced by this function must be the type 
used by Spark's
  * InternalRow API for the {@link DataType SQL data type} returned by {@link 
#resultType()}.
+ * <p>
+ * <b>IMPORTANT</b>: the default implementation of {@link #produceResult} 
throws
+ * {@link UnsupportedOperationException}. Users can choose to override this 
method, or implement
+ * a "magic method" with name {@link #MAGIC_METHOD_NAME} which takes 
individual parameters
+ * instead of a {@link InternalRow}. The magic method will be loaded by Spark 
through Java
+ * reflection and also will provide better performance in general, due to 
optimizations such as
+ * codegen, Java boxing and so on.
+ *
+ * For example, a scalar UDF for adding two integers can be defined as follow 
with the magic
+ * method approach:
+ *
+ * <pre>
+ *   public class IntegerAdd implements{@code ScalarFunction<Integer>} {
+ *     public int invoke(int left, int right) {
+ *       return left + right;
+ *     }
+ *
+ *    {@literal @}Override
+ *     public produceResult(InternalRow input) {
+ *       int left = input.getInt(0);
+ *       int right = input.getInt(1);
+ *       return left + right;
+ *     }
+ *   }
+ * </pre>
+ * In this case, both {@link #MAGIC_METHOD_NAME} and {@link #produceResult} 
are defined, and Spark
+ * will first lookup the {@link #MAGIC_METHOD_NAME} method during query 
analysis. This is done by
+ * first converting the actual input SQL data types to their corresponding 
Java types following the
+ * mapping defined below, and then checking if there is a matching method from 
all the declared
+ * methods in the UDF class, using method name (i.e., {@link 
#MAGIC_METHOD_NAME}) and the Java
+ * types. If no magic method is found, Spark will falls back to use {@link 
#produceResult}.
+ * <p>
+ * The following are the mapping from {@link DataType SQL data type} to Java 
type through
+ * the magic method approach:
+ * <ul>
+ *   <li>{@link org.apache.spark.sql.types.BooleanType}: {@code boolean}</li>
+ *   <li>{@link org.apache.spark.sql.types.ByteType}: {@code byte}</li>
+ *   <li>{@link org.apache.spark.sql.types.ShortType}: {@code short}</li>
+ *   <li>{@link org.apache.spark.sql.types.IntegerType}: {@code int}</li>
+ *   <li>{@link org.apache.spark.sql.types.LongType}: {@code long}</li>
+ *   <li>{@link org.apache.spark.sql.types.FloatType}: {@code float}</li>
+ *   <li>{@link org.apache.spark.sql.types.DoubleType}: {@code double}</li>
+ *   <li>{@link org.apache.spark.sql.types.StringType}:
+ *       {@link org.apache.spark.unsafe.types.UTF8String}</li>
+ *   <li>{@link org.apache.spark.sql.types.DateType}: {@code int}</li>
+ *   <li>{@link org.apache.spark.sql.types.TimestampType}: {@code long}</li>
+ *   <li>{@link org.apache.spark.sql.types.BinaryType}: {@code byte[]}</li>
+ *   <li>{@link org.apache.spark.sql.types.CalendarIntervalType}:

Review comment:
       Sure. Where I can see more info of this? seems [the 
class](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/CalendarIntervalType.scala)
 is still marked as stable and there is no info about the deprecation. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #32082: [SPARK-34981][SQL] Implement V2 function resolution and evaluation

Reply via email to