HyukjinKwon commented on code in PR #42596:
URL: https://github.com/apache/spark/pull/42596#discussion_r1300942561
##########
python/pyspark/sql/functions.py:
##########
@@ -3669,38 +3669,83 @@ def approxCountDistinct(col: "ColumnOrName", rsd:
Optional[float] = None) -> Col
@try_remote_functions
def approx_count_distinct(col: "ColumnOrName", rsd: Optional[float] = None) ->
Column:
- """Aggregate function: returns a new :class:`~pyspark.sql.Column` for
approximate distinct count
- of column `col`.
+ """
+ Applies an aggregate function to return an approximate distinct count of
the specified column.
- .. versionadded:: 2.1.0
+ This function returns a new :class:`~pyspark.sql.Column` that estimates
the number of distinct
+ elements in a column or a group of columns.
- .. versionchanged:: 3.4.0
- Supports Spark Connect.
+ .. versionadded:: 2.1.0
.. versionchanged:: 3.4.0
Supports Spark Connect.
Parameters
----------
col : :class:`~pyspark.sql.Column` or str
+ The label of the column to count distinct values in.
rsd : float, optional
- maximum relative standard deviation allowed (default = 0.05).
- For rsd < 0.01, it is more efficient to use :func:`count_distinct`
+ The maximum allowed relative standard deviation (default = 0.05).
+ If rsd < 0.01, it would be more efficient to use
:func:`count_distinct`.
Returns
-------
:class:`~pyspark.sql.Column`
- the column of computed results.
+ A new Column object representing the approximate unique count.
+
+ See Also
+ ----------
+ :meth:`pyspark.sql.functions.count_distinct`
Examples
--------
+ Example 1: Counting distinct values in a single column DataFrame
representing integers
+
+ >>> from pyspark.sql.functions import approx_count_distinct
Review Comment:
and second reason is that `from pyspark.sql.functions import
approx_count_distinct` is perfectly fine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]