[GitHub] [spark] cloud-fan commented on a change in pull request #29458: [SPARK-32018][FOLLOWUP][Doc] Add migration guide for decimal value overflow in sum aggregation

GitBox Mon, 17 Aug 2020 21:27:23 -0700


cloud-fan commented on a change in pull request #29458:
URL: https://github.com/apache/spark/pull/29458#discussion_r471906037




##########
File path: docs/sql-migration-guide.md
##########
@@ -36,6 +36,10 @@ license: |
 
   - In Spark 3.1, NULL elements of structures, arrays and maps are converted 
to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements 
are converted to empty strings. To restore the behavior before Spark 3.1, you 
can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`.
 
+  - In Spark 3.1, when `spark.sql.ansi.enabled` is false, sum aggregation of 
decimal type column always returns `null` on decimal value overflow. In Spark 
3.0 or earlier, when `spark.sql.ansi.enabled` is false and decimal value 
overflow happens in sum aggregation of decimal type column:
+    - If it is hash aggregation with `group by` clause, a runtime exception is 
thrown.

Review comment:
       not many users know the physical nodes. How about
   ```
   In Spark 3.1, Spark always returns null if the sum of decimal overflows 
under non-ANSI
   mode (`spark.sql.ansi.enabled` is false). In Spark 3.0 or earlier, the sum 
of decimal may
   fail at runtime under non-ANSI mode (when the query has GROUP BY and is 
planned as hash aggregate)
   ```

##########
File path: docs/sql-migration-guide.md
##########
@@ -36,6 +36,10 @@ license: |
 
   - In Spark 3.1, NULL elements of structures, arrays and maps are converted 
to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements 
are converted to empty strings. To restore the behavior before Spark 3.1, you 
can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`.
 
+  - In Spark 3.1, when `spark.sql.ansi.enabled` is false, sum aggregation of 
decimal type column always returns `null` on decimal value overflow. In Spark 
3.0 or earlier, when `spark.sql.ansi.enabled` is false and decimal value 
overflow happens in sum aggregation of decimal type column:
+    - If it is hash aggregation with `group by` clause, a runtime exception is 
thrown.

Review comment:
       not many users know the physical nodes. How about
   ```
   In Spark 3.1, Spark always returns null if the sum of decimal overflows 
under non-ANSI
   mode (`spark.sql.ansi.enabled` is false). In Spark 3.0 or earlier, the sum 
of decimal may
   fail at runtime under non-ANSI mode (when the query has GROUP BY and is 
planned as hash aggregate)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #29458: [SPARK-32018][FOLLOWUP][Doc] Add migration guide for decimal value overflow in sum aggregation

Reply via email to