Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/20023
  
    Thanks for your analysis @gatorsmile. Actually the rule you specified for 
Oracle is what it uses when casting, rather then when doing arithmetic 
operations. Yes DB2 has rather different rules to define the output type of 
operations. Anyway, we can have a behavior practically identical to DB2 by 
changing the value of `MINIMUM_ADJUSTED_SCALE` to 31. Therefore, I'd propose, 
instead of using the configuration you pointed out, to use a configuration for 
the `MINIMUM_ADJUSTED_SCALE`, changing which we can have both the behavior of 
Hive and SQLServer and the one of DB2. What do you think?
    
    The reason why I am suggesting this is that my first concern is not Hive 
compliance, but SQL standard compliance. Indeed, as you con see from the 
summary, on point 1 there is not a uniform behavior (but this is OK to SQL 
standard since it gives freedom). But on point 2 we are the only ones who are 
not compliant to SQL standard. And having this behavior by default doesn't look 
the right thing to do IMHO. On point 3, only we and Hive are not compliant. 
Thus I think also that should be changed. But in that case, we can't use the 
same flag, because it would be inconsistent. What do you think?
    
    I can understand and agree that loosing precision looks scary. But to me 
returning `NULL` is even more scary if possible: indeed, `NULL` is what should 
be returned if either if the two operands are `NULL`. Thus queries running on 
other DBs which relies on this might return very bad result. For instance, 
let's think to a report where we join a prices table and a sold_product table 
per country. In this use case, we can assume that if the result is `NULL`, it 
means that there was no sold product in that country and then coalesce the 
output of the multiplication to 0. This would work well on any DB but Spark. 
With my proposal of tuning the `MINIMUM_ADJUSTED_SCALE`, each customer can 
decide (query by query) how much precision loss they can tolerate. And if we 
agree to change point 3 behavior to the SQL standard, in case of it is not 
possible to meet their desires we throw an exception, giving them the choice 
about what to do: allow more precision loss, change their input data type, et
 c. etc. This is the safer way IMHO.
    
    I would ne happy to help improving test cases. May I just kindly ask you 
how you meant to do that? What would you like to be tested more? Would you like 
me to add more test cases in scope of this PR or to open a new one for that?
    
    Thank you for your time reading my long messages. I just want to take the 
best choice and give you all the elements I have to decide for the best all 
together.
    Thank you.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to