ulysses-you commented on PR #37207:
URL: https://github.com/apache/spark/pull/37207#issuecomment-1188527477

   thank you @gengliangwang for the review. For your concerns:
   
   > I think the issue is that the expression Divide allows precision loss by 
default when handling decimals. Setting the flag 
spark.sql.decimalOperations.allowPrecisionLoss as false will produce the same 
result in the test case 
https://github.com/apache/spark/pull/37207/files#diff-450322f1656760b5126d42d2334685a9b488540a267c1557c82f8e01f723a672R1026.
 Thus the current behavior is sort of expected, it is consistent with the 
result of dividing a decimal.
   
   This main issue is, in average evaluation we change the precison twice. The 
first time we change the result to the precision of divide which apply the 
config you point out. The second time we change the result to the precison of 
average which is different with divide. To solve it, this pr added a new 
decimal divide to avoid the first precision change. I'm not sure the precison 
of average should be same with divide, if so the result decimal type of average 
is wrong.
   
   > Besides, if we have to change the behavior of Average, we can consider 
having a decimal dividing expression that doesn't lose precision instead.
   
   That's what this pr did. This pr combines the decimal devide and overflow 
checking into one expression.
   
   > I think that's what the current code did. If there is a difference after 
this PR, please create a new test case.
   
   no, the current code checks the sum value if is overflow and this pr changes 
to check the result of average if is overflow. The sum data type is 
`DecimalType.bounded(p + 10, s)`  which means it will overflow with almost 1 
billion rows, so sorry for that it really hard to add the test case.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to