sunchao commented on a change in pull request #32407:
URL: https://github.com/apache/spark/pull/32407#discussion_r627151754



##########
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java
##########
@@ -29,33 +29,62 @@
  * <p>
  * The JVM type of result values produced by this function must be the type 
used by Spark's
  * InternalRow API for the {@link DataType SQL data type} returned by {@link 
#resultType()}.
+ * The mapping between {@link DataType} and the corresponding JVM type is 
defined below.
  * <p>
  * <b>IMPORTANT</b>: the default implementation of {@link #produceResult} 
throws
- * {@link UnsupportedOperationException}. Users can choose to override this 
method, or implement
- * a "magic method" with name {@link #MAGIC_METHOD_NAME} which takes 
individual parameters
- * instead of a {@link InternalRow}. The magic method will be loaded by Spark 
through Java
- * reflection and will also provide better performance in general, due to 
optimizations such as
- * codegen, removal of Java boxing, etc.
- *
- * For example, a scalar UDF for adding two integers can be defined as follow 
with the magic
+ * {@link UnsupportedOperationException}. Users must choose to either override 
this method, or
+ * implement a magic method with name {@link #MAGIC_METHOD_NAME}, which takes 
individual parameters
+ * instead of a {@link InternalRow}. The magic method approach is generally 
recommended because it
+ * provides better performance over the default {@link #produceResult}, due to 
optimizations such
+ * as whole-stage codegen, elimination of Java boxing, etc.
+ * <p>
+ * In addition, for functions implemented in Java that are stateless, users 
can optionally define
+ * the {@link #MAGIC_METHOD_NAME} as a static method, which further avoids 
certain runtime costs
+ * such as nullness check on the method receiver, potential Java dynamic 
dispatch, etc.
+ * <p>
+ * For example, a scalar UDF for adding two integers can be defined as follow 
with the static magic
  * method approach:
  *
  * <pre>
  *   public class IntegerAdd implements{@code ScalarFunction<Integer>} {
+ *     public DataType[] inputTypes() {
+ *       return new DataType[] { DataTypes.IntegerType, DataTypes.IntegerType 
};
+ *     }
  *     public int invoke(int left, int right) {
  *       return left + right;
  *     }
  *   }
  * </pre>
- * In this case, since {@link #MAGIC_METHOD_NAME} is defined, Spark will use 
it over
- * {@link #produceResult} to evalaute the inputs. In general Spark looks up 
the magic method by
- * first converting the actual input SQL data types to their corresponding 
Java types following
- * the mapping defined below, and then checking if there is a matching method 
from all the
- * declared methods in the UDF class, using method name (i.e., {@link 
#MAGIC_METHOD_NAME}) and
- * the Java types. If no magic method is found, Spark will falls back to use 
{@link #produceResult}.
+ * In the above, since {@link #MAGIC_METHOD_NAME} is defined, and also that it 
has
+ * matching parameter types and return type, Spark will use it to evaluate 
inputs.
+ * <p>
+ * As another example, in the following:
+ * <pre>
+ *   public class IntegerAdd implements{@code ScalarFunction<Integer>} {
+ *     public DataType[] inputTypes() {
+ *       return new DataType[] { DataTypes.IntegerType, DataTypes.IntegerType 
};
+ *     }
+ *     public static int invoke(int left, int right) {
+ *       return left + right;
+ *     }
+ *     public Integer produceResult(InternalRow input) {
+ *       return input.getInt(0) + input.getInt(1);
+ *     }
+ *   }
+ * </pre>
+ *
+ * the class defines both the magic method and the {@link #produceResult}, and 
Spark will use
+ * {@link #MAGIC_METHOD_NAME} over the {@link #produceResult(InternalRow)} as 
it takes higher
+ * precedence. Also note that the magic method is annotated as a static method 
in this case.

Review comment:
       Java doesn't allow both `static int invoke` and `int invoke` with the 
same parameters. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to