[GitHub] [spark] srowen commented on a change in pull request #30745: [SPARK-33678][SQL] Product aggregation function

GitBox Sun, 21 Feb 2021 07:16:05 -0800


srowen commented on a change in pull request #30745:
URL: https://github.com/apache/spark/pull/30745#discussion_r579821551




##########
File path: python/pyspark/sql/functions.py
##########
@@ -222,6 +222,45 @@ def sum_distinct(col):
     return _invoke_function_over_column("sum_distinct", col)
 
 
+def product(col, scale=1.0):
+    """
+    Aggregate function: returns the product of the values in a group.
+
+    .. versionadded:: 3.2.0
+
+    Parameters
+    ----------
+    col : str, :class:`Column`
+        column containing values to be multiplied together
+    scale : float

Review comment:
       I get that a very large float loses some precision as it only has so 
many significant digits to play with. But does scaling them make it more 
precise? each multiplication by 0.01 or whatever also introduces some error. It 
won't stop the final result from overflowing a double if in fact it's so large 
right? or is the idea that you don't want the actual product, but the scaled 
product? In that case can you compute `product(0.01 * $"col")` anyway, that 
kind of thing?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #30745: [SPARK-33678][SQL] Product aggregation function

Reply via email to