rwpenney commented on a change in pull request #30745:
URL: https://github.com/apache/spark/pull/30745#discussion_r579760686
##########
File path: python/pyspark/sql/functions.py
##########
@@ -222,6 +222,45 @@ def sum_distinct(col):
return _invoke_function_over_column("sum_distinct", col)
+def product(col, scale=1.0):
+ """
+ Aggregate function: returns the product of the values in a group.
+
+ .. versionadded:: 3.2.0
+
+ Parameters
+ ----------
+ col : str, :class:`Column`
+ column containing values to be multiplied together
+ scale : float
Review comment:
Thanks, Sean, for picking this up again.
The scale-factor is there to allow the user some control over products that
are likely to cause overflow. For example, if the user knows that the numbers
in the product are typically about 100, they could set the scale to be 0.01.
This would reduce the risk of the product overflowing when large numbers of
terms are multiplied together. Obviously, the user would then have to take
account of a factor of (0.01^N) in their subsequent calculations, but that
should be far easier than dealing with overflow. By default, the scale is set
to one, and the implementation treats this as a special case which doesn't
involve additional multiplications by unity.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]