[GitHub] spark pull request #21599: [SPARK-24598][SQL] Overflow on arithmetic operati...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21599#discussion_r199632638 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -128,17 +128,31 @@ abstract class BinaryArithmetic extends BinaryOperator with NullIntolerant { def calendarIntervalMethod: String = sys.error("BinaryArithmetics must override either calendarIntervalMethod or genCode") + def checkOverflowCode(result: String, op1: String, op2: String): String = +sys.error("BinaryArithmetics must override either checkOverflowCode or genCode") + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType match { case _: DecimalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$decimalMethod($eval2)") case CalendarIntervalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$calendarIntervalMethod($eval2)") +// In the following cases, overflow can happen, so we need to check the result is valid. +// Otherwise we throw an ArithmeticException --- End diff -- @gatorsmile @hvanhovell do you have time to check this and give your opinion here? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21599: [SPARK-24598][SQL] Overflow on arithmetic operati...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21599#discussion_r197246916 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -128,17 +128,31 @@ abstract class BinaryArithmetic extends BinaryOperator with NullIntolerant { def calendarIntervalMethod: String = sys.error("BinaryArithmetics must override either calendarIntervalMethod or genCode") + def checkOverflowCode(result: String, op1: String, op2: String): String = +sys.error("BinaryArithmetics must override either checkOverflowCode or genCode") + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType match { case _: DecimalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$decimalMethod($eval2)") case CalendarIntervalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$calendarIntervalMethod($eval2)") +// In the following cases, overflow can happen, so we need to check the result is valid. +// Otherwise we throw an ArithmeticException --- End diff -- Personally, I am quite against returning null. It is not something a user expects, so he/she is likely not to check for it (when I see a NULL myself, I think that one of the 2 operands was NULL, not that an overflow occurred), so he/she won't realize the issue and would find corrupted data. Moreover, this is not how RDBMS behaves and it is against SQL standard. So I think that the behavior which was chosen for DECIMAL was wrong and I'd prefer not to introduce the same behavior also in other places. Anyway I see your point about consistency over the codebase and it makes sense. I'd love to know @gatorsmile and @hvanhovell's opinions too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21599: [SPARK-24598][SQL] Overflow on arithmetic operati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21599#discussion_r197235683 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -128,17 +128,31 @@ abstract class BinaryArithmetic extends BinaryOperator with NullIntolerant { def calendarIntervalMethod: String = sys.error("BinaryArithmetics must override either calendarIntervalMethod or genCode") + def checkOverflowCode(result: String, op1: String, op2: String): String = +sys.error("BinaryArithmetics must override either checkOverflowCode or genCode") + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = dataType match { case _: DecimalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$decimalMethod($eval2)") case CalendarIntervalType => defineCodeGen(ctx, ev, (eval1, eval2) => s"$eval1.$calendarIntervalMethod($eval2)") +// In the following cases, overflow can happen, so we need to check the result is valid. +// Otherwise we throw an ArithmeticException --- End diff -- In current Spark we are very conservative about runtime error, as it may break the data pipeline middle away, and returning null is a commonly used strategy. Shall we follow it here? We can throw exception when we have a strict mode. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21599: [SPARK-24598][SQL] Overflow on arithmetic operati...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/21599 [SPARK-24598][SQL] Overflow on arithmetic operations returns incorrect result ## What changes were proposed in this pull request? When an overflow occurs performing an arithmetic operation, we are returning an incorrect value. Instead, we should throw an exception, as stated in the SQL standard. ## How was this patch tested? added UT + existing UTs (improved) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-24598 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21599.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21599 commit 5c662f6987b7cdad016e4134f980fd0b8e02d220 Author: Marco Gaido Date: 2018-06-20T12:20:04Z [SPARK-24598][SQL] Overflow on airthmetic operation returns incorrect result --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org