[GitHub] [spark] cloud-fan commented on a change in pull request #29026: [SPARK-28067][SPARK-32018] Fix decimal overflow issues

2020-07-07 Thread GitBox


cloud-fan commented on a change in pull request #29026:
URL: https://github.com/apache/spark/pull/29026#discussion_r451258078



##
File path: sql/core/src/test/scala/org/apache/spark/sql/UnsafeRowSuite.scala
##
@@ -178,4 +178,14 @@ class UnsafeRowSuite extends SparkFunSuite {
 // Makes sure hashCode on unsafe array won't crash
 unsafeRow.getArray(0).hashCode()
   }
+
+  test("SPARK-32018: setDecimal with overflowed value") {
+val d1 = new 
Decimal().set(BigDecimal("1000")).toPrecision(38, 18)
+val row = InternalRow.apply(d1)
+val unsafeRow = UnsafeProjection.create(Array[DataType](DecimalType(38, 
18))).apply(row)
+assert(unsafeRow.getDecimal(0, 38, 18) === d1)
+val d2 = (d1 * Decimal(10)).toPrecision(39, 18)
+unsafeRow.setDecimal(0, d2, 38)
+assert(unsafeRow.getDecimal(0, 38, 18) === null)

Review comment:
   `UnsafeRow` is a low-level entity and doesn't respect ansi flag.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29026: [SPARK-28067][SPARK-32018] Fix decimal overflow issues

2020-07-07 Thread GitBox


cloud-fan commented on a change in pull request #29026:
URL: https://github.com/apache/spark/pull/29026#discussion_r451257595



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala
##
@@ -58,39 +58,50 @@ case class Sum(child: Expression) extends 
DeclarativeAggregate with ImplicitCast
 case _ => DoubleType
   }
 
-  private lazy val sumDataType = resultType
-
-  private lazy val sum = AttributeReference("sum", sumDataType)()
+  private lazy val sum = AttributeReference("sum", resultType)()
 
   private lazy val isEmpty = AttributeReference("isEmpty", BooleanType, 
nullable = false)()
 
-  private lazy val zero = Literal.default(sumDataType)
+  private lazy val zero = Literal.default(resultType)
 
   override lazy val aggBufferAttributes = resultType match {
 case _: DecimalType => sum :: isEmpty :: Nil
 case _ => sum :: Nil
   }
 
   override lazy val initialValues: Seq[Expression] = resultType match {
-case _: DecimalType => Seq(Literal(null, resultType), Literal(true, 
BooleanType))
+case _: DecimalType => Seq(zero, Literal(true, BooleanType))
 case _ => Seq(Literal(null, resultType))
   }
 
   override lazy val updateExpressions: Seq[Expression] = {
-if (child.nullable) {
-  val updateSumExpr = coalesce(coalesce(sum, zero) + 
child.cast(sumDataType), sum)
-  resultType match {
-case _: DecimalType =>
-  Seq(updateSumExpr, isEmpty && child.isNull)
-case _ => Seq(updateSumExpr)
-  }
-} else {
-  val updateSumExpr = coalesce(sum, zero) + child.cast(sumDataType)
-  resultType match {
-case _: DecimalType =>
-  Seq(updateSumExpr, Literal(false, BooleanType))
-case _ => Seq(updateSumExpr)
-  }
+resultType match {
+  case _: DecimalType =>
+// For decimal type, the initial value of `sum` is 0. We need to keep 
`sum` unchanged if
+// the input is null, as SUM function ignores null input. The `sum` 
can only be null if
+// overflow happens under non-ansi mode.

Review comment:
   It's the `Add` expression, and it always respects the ansi mode, no 
matter it's in update or merge expression.
   
   This makes sense. If overflow happens, we will fail anyway. It's better to 
fail earlier to save resources.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29026: [SPARK-28067][SPARK-32018] Fix decimal overflow issues

2020-07-07 Thread GitBox


cloud-fan commented on a change in pull request #29026:
URL: https://github.com/apache/spark/pull/29026#discussion_r451249339



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
##
@@ -288,7 +288,7 @@ public void setDecimal(int ordinal, Decimal value, int 
precision) {
   Platform.putLong(baseObject, baseOffset + cursor, 0L);
   Platform.putLong(baseObject, baseOffset + cursor + 8, 0L);
 
-  if (value == null) {
+  if (value == null || !value.changePrecision(precision, value.scale())) {

Review comment:
   Yes we should





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29026: [SPARK-28067][SPARK-32018] fix decimal overflow issues

2020-07-07 Thread GitBox


cloud-fan commented on a change in pull request #29026:
URL: https://github.com/apache/spark/pull/29026#discussion_r450994003



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
##
@@ -288,7 +288,7 @@ public void setDecimal(int ordinal, Decimal value, int 
precision) {
   Platform.putLong(baseObject, baseOffset + cursor, 0L);
   Platform.putLong(baseObject, baseOffset + cursor + 8, 0L);
 
-  if (value == null) {
+  if (value == null || !value.changePrecision(precision, value.scale())) {

Review comment:
   Thanks to @allisonwang-db for catching this bug!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org