rwpenney commented on a change in pull request #30745:
URL: https://github.com/apache/spark/pull/30745#discussion_r579826841



##########
File path: python/pyspark/sql/functions.py
##########
@@ -222,6 +222,45 @@ def sum_distinct(col):
     return _invoke_function_over_column("sum_distinct", col)
 
 
+def product(col, scale=1.0):
+    """
+    Aggregate function: returns the product of the values in a group.
+
+    .. versionadded:: 3.2.0
+
+    Parameters
+    ----------
+    col : str, :class:`Column`
+        column containing values to be multiplied together
+    scale : float

Review comment:
       Agreed, the `scale` parameter isn't there to improve precision. (Perhaps 
my "0.01" example wasn't ideal, 1/128 might have been a better choice that is 
less likely to cause noise in the least significant bits.) The aim, as you say, 
is to allow the user to compute the *scaled* product, on the assumption that 
they can allow for the *overall* scaling (i.e. the 0.01^N or 2^(-7N) in some 
other way at subsequent stages of their calculations. Clearly, there's no point 
in having this scaling within `product()` if the user just multiplies the 
result by (scale ^ -N).
   
   Again, you're right that the user could just use `product(0.01 * $"col")`, 
but it seemed worth offering an overload that invites the user to consider 
using a scaling when they can predict something about the order-of-magnitude of 
their product, and might make their intention a little clearer than just 
multiplying `$"col"` by 0.01. This also allows a little bit more optimization 
if `scale` is the result of a calculation or a configuration parameter, and 
where it could turn out to be 1.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to