mgaido91 commented on issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with decimal with negative scale URL: https://github.com/apache/spark/pull/22450#issuecomment-473184079 thanks for your comment @cloud-fan. I am not sure what you are proposing here. > AFAIK many database follow the SQL way to define the decimal type, i.e. 0 <= scale <= precision I'd argue that this is not true. Many SQL DBs do not have this rule, despite some indeed have it (eg. SQLServer). I think I tried to state the tradeoffs of different choices in the mailing list, ie.: - If we ensure that the scale must be positive, we have a backward compatibility issue, since there may be operations which we are not able to support anymore (not just producing different results, but we are not able to represent the result at all causing a failure); - If we leave things as they are now, ie. we allow negative loss, let me comment your 5 points. > 1. we would want to support scale > precision as well, to be consistent with the Java way to define decimal type. Not sure this is a good idea in terms of compatibility with data sources and since this is a corner case which doesn't exist now, I think introducing it may lead us to potential problems in that sense and we may be unable to remove the support for backward compatibility reasons. We may try and do that under a config flag though. > 2. need to fix some corner cases of precision loss(what this PR is trying to fix) > 3. bad compatibility with data sources. (sql("select 1e10 as a").write.parquet("/tmp/tt") would fail) Yes, this is indeed a problem, but it is a problem which is already present, so I think we can consider this PR independent from this issue, as it changes nothing wrt it. > 4. may have unknown pitfalls, as it's not widely supported by other databases. Not sure what you mean here. > 5. fully backward compatible This PR is so as there is no change when the scale is non-negative, and when it is it fixes the computation of the precision of the result of division, so the only change if the precision of the result of division. We can also introduce a config to stay on the safe side, but this is basically a fix for a situation which was handled in a bad way before, so I see no reason to turn off this behavior...
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
