Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21499#discussion_r193341970
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
@@ -161,13 +161,17 @@ object DecimalType extends AbstractDataType {
* This method is used only when
`spark.sql.decimalOperations.allowPrecisionLoss` is set to true.
*/
private[sql] def adjustPrecisionScale(precision: Int, scale: Int):
DecimalType = {
- // Assumptions:
+ // Assumption:
assert(precision >= scale)
- assert(scale >= 0)
if (precision <= MAX_PRECISION) {
// Adjustment only needed when we exceed max precision
DecimalType(precision, scale)
+ } else if (scale < 0) {
+ // Decimal can have negative scale (SPARK-24468). In this case, we
cannot allow a precision
+ // loss since we would cause a loss of digits in the integer part.
--- End diff --
Unfortunately, the SQL standard says nothing about this. It only states:
> the result is of a numeric type that depends only on the numeric type of
the operands. If the result of an arithmetic operation cannot be represented
exactly in its result type, then whether the result is rounded or truncated is
implementation-defined. An
exception condition is raised if the result is outside the range of numeric
values of the result type,
It says nothing about which should be the result scale/precision (it just
says that it has to be based on the input types). So it is up to us to decide
which approach to follow (ie. whether to be more aggressive in allowing
precision loss or not).
The reasons of my choice are:
- In the current implementation, when we allow precision loss we anyway
enforce at least 6 decimal digits precision. So we are guaranteeing to our
users that any (successful) arithmetical operation will have at least 6 decimal
digits of precision anyway. It'd be very weird to me that we do not allow to
loose precision for operations which would be representable with 4 decimal
digits, but we allow data loss on the integer part....
- Hive and SQLServer on which we base our implementation do not allow
negative scale, so we are already allowing to represent more operations than
them. And they do not therefore allow any precision loss on the integer part;
- I found no other RDBMS allowing a precision loss on the integer part.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]