Hello, I’m writing a custom Spark catalyst Expression with custom codegen, but it seems that Spark (3.0.0) doesn’t want to generate code, and falls back to interpreted mode.
I created my SparkSession with spark.sql.codegen.factoryMode=CODEGEN_ONLY and spark.sql.codegen.fallback=false, hoping that I can force Spark to do code generation. Then, I have this custom Expression with both interpreted mode and codegen defined: // Check if a string has whitespaces on either end case class IsTrimmedExpr(child: Expression) extends UnaryExpression with ExpectsInputTypes { override def inputTypes: Seq[DataType] = Seq(StringType) override lazy val dataType: DataType = BooleanType override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { throw new RuntimeException(“expected code gen”) nullSafeCodeGen(ctx, ev, input => s”($input.trim().equals($input))“) } override protected def nullSafeEval(input: Any): Any = { throw new RuntimeException(“should not eval”) val str = input.asInstanceOf[org.apache.spark.unsafe.types.UTF8String] str.trim.equals(str) } } which I register into the session’s registry (I know this is undocumented API, but since the registry is able to find the expression, it shouldn't affect the execution stage, right?): spark.sessionState.functionRegistry.registerFunction( FunctionIdentifier(“is_trimmed”), { case Seq(s) => IsTrimmedExpr(s) } ) To invoke the function/Expression, I do val df = Seq(” abc”, “def”, “56 “, ” 123 “, “what is a trim”).toDF(“word”) df.selectExpr(“word”, “is_trimmed(word)“).show() The expected behavior is to get an exception from doGenCode(). But instead I got the exception from the nullSafeEval() function which should not be run at all. How do I force Spark to use codegen mode? Best, Han