Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21499#discussion_r193341970
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ---
    @@ -161,13 +161,17 @@ object DecimalType extends AbstractDataType {
        * This method is used only when 
`spark.sql.decimalOperations.allowPrecisionLoss` is set to true.
        */
       private[sql] def adjustPrecisionScale(precision: Int, scale: Int): 
DecimalType = {
    -    // Assumptions:
    +    // Assumption:
         assert(precision >= scale)
    -    assert(scale >= 0)
     
         if (precision <= MAX_PRECISION) {
           // Adjustment only needed when we exceed max precision
           DecimalType(precision, scale)
    +    } else if (scale < 0) {
    +      // Decimal can have negative scale (SPARK-24468). In this case, we 
cannot allow a precision
    +      // loss since we would cause a loss of digits in the integer part.
    --- End diff --
    
    Unfortunately, the SQL standard says nothing about this. It only states:
    
    > the result is of a numeric type that depends only on the numeric type of 
the operands. If the result of an arithmetic operation cannot be represented 
exactly in its result type, then whether the result is rounded or truncated is 
implementation-defined. An
    exception condition is raised if the result is outside the range of numeric 
values of the result type,
    
    It says nothing about which should be the result scale/precision (it just 
says that it has to be based on the input types). So it is up to us to decide 
which approach to follow (ie. whether to be more aggressive in allowing 
precision loss or not).
    
    The reasons of my choice are:
     - In the current implementation, when we allow precision loss we anyway 
enforce at least 6 decimal digits precision. So we are guaranteeing to our 
users that any (successful) arithmetical operation will have at least 6 decimal 
digits of precision anyway.  It'd be very weird to me that we do not allow to 
loose precision for operations which would be representable with 4 decimal 
digits, but we allow data loss on the integer part....
     - Hive and SQLServer on which we base our implementation do not allow 
negative scale, so we are already allowing to represent more operations than 
them. And they do not therefore allow any precision loss on the integer part;
     - I found no other RDBMS allowing a precision loss on the integer part.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to