[GitHub] [spark] rwpenney commented on a change in pull request #30745: [SPARK-33678][SQL] Product aggregation function

GitBox Sat, 20 Feb 2021 22:49:36 -0800


rwpenney commented on a change in pull request #30745:
URL: https://github.com/apache/spark/pull/30745#discussion_r579760686




##########
File path: python/pyspark/sql/functions.py
##########
@@ -222,6 +222,45 @@ def sum_distinct(col):
     return _invoke_function_over_column("sum_distinct", col)
 
 
+def product(col, scale=1.0):
+    """
+    Aggregate function: returns the product of the values in a group.
+
+    .. versionadded:: 3.2.0
+
+    Parameters
+    ----------
+    col : str, :class:`Column`
+        column containing values to be multiplied together
+    scale : float

Review comment:
       Thanks, Sean, for picking this up again.
   
   The scale-factor is there to allow the user some control over products that 
are likely to cause overflow. For example, if the user knows that the numbers 
in the product are typically about 100, they could set the scale to be 0.01. 
This would reduce the risk of the product overflowing when large numbers of 
terms are multiplied together. Obviously, the user would then have to take 
account of a factor of (0.01^N) in their subsequent calculations, but that 
should be far easier than dealing with overflow. By default, the scale is set 
to one, and the implementation treats this as a special case which doesn't 
involve additional multiplications by unity.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rwpenney commented on a change in pull request #30745: [SPARK-33678][SQL] Product aggregation function

Reply via email to