sunchao commented on a change in pull request #32082:
URL: https://github.com/apache/spark/pull/32082#discussion_r620710738
##########
File path:
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
##########
@@ -23,17 +23,68 @@
/**
* Interface for a function that produces a result value for each input row.
* <p>
- * For each input row, Spark will call a produceResult method that corresponds
to the
- * {@link #inputTypes() input data types}. The expected JVM argument types
must be the types used by
- * Spark's InternalRow API. If no direct method is found or when not using
codegen, Spark will call
- * {@link #produceResult(InternalRow)}.
+ * To evaluate each input row, Spark will first try to lookup and use a "magic
method" (described
+ * below) through Java reflection. If the method is not found, Spark will call
+ * {@link #produceResult(InternalRow)} as a fallback approach.
* <p>
* The JVM type of result values produced by this function must be the type
used by Spark's
* InternalRow API for the {@link DataType SQL data type} returned by {@link
#resultType()}.
+ * <p>
+ * <b>IMPORTANT</b>: the default implementation of {@link #produceResult}
throws
+ * {@link UnsupportedOperationException}. Users can choose to override this
method, or implement
+ * a "magic method" with name {@link #MAGIC_METHOD_NAME} which takes
individual parameters
+ * instead of a {@link InternalRow}. The magic method will be loaded by Spark
through Java
+ * reflection and will also provide better performance in general, due to
optimizations such as
+ * codegen, removal of Java boxing, etc.
+ *
+ * For example, a scalar UDF for adding two integers can be defined as follow
with the magic
+ * method approach:
+ *
+ * <pre>
+ * public class IntegerAdd implements{@code ScalarFunction<Integer>} {
+ * public int invoke(int left, int right) {
+ * return left + right;
+ * }
+ * }
+ * </pre>
+ * In this case, since {@link #MAGIC_METHOD_NAME} is defined, Spark will use
it over
+ * {@link #produceResult} to evalaute the inputs. In general Spark looks up
the magic method by
+ * first converting the actual input SQL data types to their corresponding
Java types following
+ * the mapping defined below, and then checking if there is a matching method
from all the
+ * declared methods in the UDF class, using method name (i.e., {@link
#MAGIC_METHOD_NAME}) and
+ * the Java types. If no magic method is found, Spark will falls back to use
{@link #produceResult}.
+ * <p>
+ * The following are the mapping from {@link DataType SQL data type} to Java
type through
+ * the magic method approach:
+ * <ul>
+ * <li>{@link org.apache.spark.sql.types.BooleanType}: {@code boolean}</li>
+ * <li>{@link org.apache.spark.sql.types.ByteType}: {@code byte}</li>
+ * <li>{@link org.apache.spark.sql.types.ShortType}: {@code short}</li>
+ * <li>{@link org.apache.spark.sql.types.IntegerType}: {@code int}</li>
+ * <li>{@link org.apache.spark.sql.types.LongType}: {@code long}</li>
+ * <li>{@link org.apache.spark.sql.types.FloatType}: {@code float}</li>
+ * <li>{@link org.apache.spark.sql.types.DoubleType}: {@code double}</li>
+ * <li>{@link org.apache.spark.sql.types.StringType}:
+ * {@link org.apache.spark.unsafe.types.UTF8String}</li>
+ * <li>{@link org.apache.spark.sql.types.DateType}: {@code int}</li>
+ * <li>{@link org.apache.spark.sql.types.TimestampType}: {@code long}</li>
+ * <li>{@link org.apache.spark.sql.types.BinaryType}: {@code byte[]}</li>
+ * <li>{@link org.apache.spark.sql.types.DayTimeIntervalType}: {@code
long}</li>
+ * <li>{@link org.apache.spark.sql.types.YearMonthIntervalType}: {@code
int}</li>
+ * <li>{@link org.apache.spark.sql.types.DecimalType}:
+ * {@link org.apache.spark.sql.types.Decimal}</li>
+ * <li>{@link org.apache.spark.sql.types.StructType}: {@link
InternalRow}</li>
+ * <li>{@link org.apache.spark.sql.types.ArrayType}:
+ * {@link org.apache.spark.sql.catalyst.util.ArrayData}</li>
+ * <li>{@link org.apache.spark.sql.types.MapType}:
+ * {@link org.apache.spark.sql.catalyst.util.MapData}</li>
+ * <li>any other type: {@code Object}</li>
Review comment:
we don't need this - will remove.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]