mgaido91 commented on issue #22450: [SPARK-25454][SQL] Avoid precision loss in 
division with decimal with negative scale
URL: https://github.com/apache/spark/pull/22450#issuecomment-473184079
 
 
   thanks for your comment @cloud-fan. I am not sure what you are proposing 
here.
   
   > AFAIK many database follow the SQL way to define the decimal type, i.e. 0 
<= scale <= precision
   
   I'd argue that this is not true. Many SQL DBs do not have this rule, despite 
some indeed have it (eg. SQLServer).
   
   I think I tried to state the tradeoffs of different choices in the mailing 
list, ie.:
    - If we ensure that the scale must be positive, we have a backward 
compatibility issue, since there may be operations which we are not able to 
support anymore (not just producing different results, but we are not able to 
represent the result at all causing a failure);
    - If we leave things as they are now, ie. we allow negative loss, let me 
comment your 5 points.
   
   > 1. we would want to support scale > precision as well, to be consistent 
with the Java way to define decimal type.
   
   Not sure this is a good idea in terms of compatibility with data sources and 
since this is a corner case which doesn't exist now, I think introducing it may 
lead us to potential problems in that sense and we may be unable to remove the 
support for backward compatibility reasons. We may try and do that under a 
config flag though.
   
   > 2. need to fix some corner cases of precision loss(what this PR is trying 
to fix)
   > 3. bad compatibility with data sources. (sql("select 1e10 as 
a").write.parquet("/tmp/tt") would fail)
   
   Yes, this is indeed a problem, but it is a problem which is already present, 
so I think we can consider this PR independent from this issue, as it changes 
nothing wrt it.
   
   > 4. may have unknown pitfalls, as it's not widely supported by other 
databases.
   
   Not sure what you mean here.
   
   > 5. fully backward compatible
   
   This PR is so as there is no change when the scale is non-negative, and when 
it is it fixes the computation of the precision of the result of division, so 
the only change if the precision of the result of division. We can also 
introduce a config to stay on the safe side, but this is basically a fix for a 
situation which was handled in a bad way before, so I see no reason to turn off 
this behavior...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to