[GitHub] spark pull request #21599: [SPARK-24598][SQL] Overflow on arithmetic operati...

2018-07-02 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21599#discussion_r199632638
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -128,17 +128,31 @@ abstract class BinaryArithmetic extends 
BinaryOperator with NullIntolerant {
   def calendarIntervalMethod: String =
 sys.error("BinaryArithmetics must override either 
calendarIntervalMethod or genCode")
 
+  def checkOverflowCode(result: String, op1: String, op2: String): String =
+sys.error("BinaryArithmetics must override either checkOverflowCode or 
genCode")
+
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = 
dataType match {
 case _: DecimalType =>
   defineCodeGen(ctx, ev, (eval1, eval2) => 
s"$eval1.$decimalMethod($eval2)")
 case CalendarIntervalType =>
   defineCodeGen(ctx, ev, (eval1, eval2) => 
s"$eval1.$calendarIntervalMethod($eval2)")
+// In the following cases, overflow can happen, so we need to check 
the result is valid.
+// Otherwise we throw an ArithmeticException
--- End diff --

@gatorsmile @hvanhovell do you have time to check this and give your 
opinion here? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21599: [SPARK-24598][SQL] Overflow on arithmetic operati...

2018-06-21 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21599#discussion_r197246916
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -128,17 +128,31 @@ abstract class BinaryArithmetic extends 
BinaryOperator with NullIntolerant {
   def calendarIntervalMethod: String =
 sys.error("BinaryArithmetics must override either 
calendarIntervalMethod or genCode")
 
+  def checkOverflowCode(result: String, op1: String, op2: String): String =
+sys.error("BinaryArithmetics must override either checkOverflowCode or 
genCode")
+
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = 
dataType match {
 case _: DecimalType =>
   defineCodeGen(ctx, ev, (eval1, eval2) => 
s"$eval1.$decimalMethod($eval2)")
 case CalendarIntervalType =>
   defineCodeGen(ctx, ev, (eval1, eval2) => 
s"$eval1.$calendarIntervalMethod($eval2)")
+// In the following cases, overflow can happen, so we need to check 
the result is valid.
+// Otherwise we throw an ArithmeticException
--- End diff --

Personally, I am quite against returning null. It is not something a user 
expects, so he/she is likely not to check for it (when I see a NULL myself, I 
think that one of the 2 operands was NULL, not that an overflow occurred), so 
he/she won't realize the issue and would find corrupted data. Moreover, this is 
not how RDBMS behaves and it is against SQL standard. So I think that the 
behavior which was chosen for DECIMAL was wrong and I'd prefer not to introduce 
the same behavior also in other places.

Anyway I see your point about consistency over the codebase and it makes 
sense.

I'd love to know @gatorsmile and @hvanhovell's opinions too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21599: [SPARK-24598][SQL] Overflow on arithmetic operati...

2018-06-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21599#discussion_r197235683
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -128,17 +128,31 @@ abstract class BinaryArithmetic extends 
BinaryOperator with NullIntolerant {
   def calendarIntervalMethod: String =
 sys.error("BinaryArithmetics must override either 
calendarIntervalMethod or genCode")
 
+  def checkOverflowCode(result: String, op1: String, op2: String): String =
+sys.error("BinaryArithmetics must override either checkOverflowCode or 
genCode")
+
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = 
dataType match {
 case _: DecimalType =>
   defineCodeGen(ctx, ev, (eval1, eval2) => 
s"$eval1.$decimalMethod($eval2)")
 case CalendarIntervalType =>
   defineCodeGen(ctx, ev, (eval1, eval2) => 
s"$eval1.$calendarIntervalMethod($eval2)")
+// In the following cases, overflow can happen, so we need to check 
the result is valid.
+// Otherwise we throw an ArithmeticException
--- End diff --

In current Spark we are very conservative about runtime error, as it may 
break the data pipeline middle away, and returning null is a commonly used 
strategy. Shall we follow it here? We can throw exception when we have a strict 
mode.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21599: [SPARK-24598][SQL] Overflow on arithmetic operati...

2018-06-20 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/21599

[SPARK-24598][SQL] Overflow on arithmetic operations returns incorrect 
result

## What changes were proposed in this pull request?

When an overflow occurs performing an arithmetic operation, we are 
returning an incorrect value. Instead, we should throw an exception, as stated 
in the SQL standard. 

## How was this patch tested?

added UT + existing UTs (improved)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-24598

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21599


commit 5c662f6987b7cdad016e4134f980fd0b8e02d220
Author: Marco Gaido 
Date:   2018-06-20T12:20:04Z

[SPARK-24598][SQL] Overflow on airthmetic operation returns incorrect result




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org