Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21599#discussion_r197246916
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
---
@@ -128,17 +128,31 @@ abstract class BinaryArithmetic extends
BinaryOperator with NullIntolerant {
def calendarIntervalMethod: String =
sys.error("BinaryArithmetics must override either
calendarIntervalMethod or genCode")
+ def checkOverflowCode(result: String, op1: String, op2: String): String =
+ sys.error("BinaryArithmetics must override either checkOverflowCode or
genCode")
+
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
dataType match {
case _: DecimalType =>
defineCodeGen(ctx, ev, (eval1, eval2) =>
s"$eval1.$decimalMethod($eval2)")
case CalendarIntervalType =>
defineCodeGen(ctx, ev, (eval1, eval2) =>
s"$eval1.$calendarIntervalMethod($eval2)")
+ // In the following cases, overflow can happen, so we need to check
the result is valid.
+ // Otherwise we throw an ArithmeticException
--- End diff --
Personally, I am quite against returning null. It is not something a user
expects, so he/she is likely not to check for it (when I see a NULL myself, I
think that one of the 2 operands was NULL, not that an overflow occurred), so
he/she won't realize the issue and would find corrupted data. Moreover, this is
not how RDBMS behaves and it is against SQL standard. So I think that the
behavior which was chosen for DECIMAL was wrong and I'd prefer not to introduce
the same behavior also in other places.
Anyway I see your point about consistency over the codebase and it makes
sense.
I'd love to know @gatorsmile and @hvanhovell's opinions too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]