xkrogen commented on code in PR #37634:
URL: https://github.com/apache/spark/pull/37634#discussion_r974499166
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala:
##########
@@ -252,28 +267,44 @@ object GenerateUnsafeProjection extends
CodeGenerator[Seq[Expression], UnsafePro
""".stripMargin
}
+ /**
+ * Wrap `inputExpr` in a try-catch block that will catch any
[[NullPointerException]] that is
+ * thrown, instead throwing a (more helpful) error message as provided by
+ *
[[org.apache.spark.sql.errors.QueryExecutionErrors.valueCannotBeNullError]].
+ */
+ private def wrapWithNpeHandling(inputExpr: String, descPath: Seq[String]):
String =
+ s"""
+ |try {
+ | ${inputExpr.trim}
Review Comment:
I prefer exception-catching as it handles this issue with zero overhead.
Adding a null-check here essentially falls back to the logic for a nullable
schema:
https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala#L119-L133
From the benchmark results, we can see that there is nontrivial overhead for
the null-check; for the simple case of a projection of a primitive, the
overhead is almost 50%:
https://github.com/apache/spark/blob/2a1f9767213c321bd52e7714fa3b5bfc4973ba40/sql/catalyst/benchmarks/UnsafeProjectionBenchmark-jdk17-results.txt#L9-L10
You call out the situation of a null silently being replaced with a default
value; this is a good point. I'm not sure how we can handle that without
additional overhead of an explicit check. It seems that the default value
replacement logic is coming from [Scala's own unboxing
logic](https://github.com/scala/scala/blob/986dcc160aab85298f6cab0bf8dd0345497cdc01/src/library/scala/runtime/BoxesRunTime.java#L102).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]