[GitHub] [spark] rwpenney edited a comment on pull request #30745: [SPARK-33678][SQL] Product aggregation function

GitBox Thu, 17 Dec 2020 12:06:47 -0800


rwpenney edited a comment on pull request #30745:
URL: https://github.com/apache/spark/pull/30745#issuecomment-747671362



   I'm not sure I see how you'll get such a general-purpose function that is 
more numerically stable than just multiplying a set of numbers together.
   
   The primary argument for **preferring** `exp(sum(log(...))` is when one can 
directly compute the logarithm of the quantities of interest _without_ having 
to call the `log` function. An obvious situation is when one is dealing with 
probabilities for quantities drawn from a Gaussian distribution, where the 
terms with the largest dynamic range are of the form $e^{x^2}$, so can 
trivially be converted to log-space. I'm rather doubtful that one actually 
reduces the effects of round-off error by starting with a set of quantities, 
computing their logarithms, and then computing the exponential after summing 
the logarithms. Clearly there are lots of non-linear terms that arise in both 
the `log` and `exp` stages, and I think one would have to do quite a careful 
error analysis to demonstrate that this is more accurate, in general, than just 
multiplying the numbers directly.
   
   Do you have a particular algebraic expression in mind that is mathematically 
equivalent to the product of a set of numbers, but which is indeed more 
accurate for the most likely use-cases when working with finite-precision 
arithmetic?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rwpenney edited a comment on pull request #30745: [SPARK-33678][SQL] Product aggregation function

Reply via email to