khalidmammadov commented on code in PR #37662:
URL: https://github.com/apache/spark/pull/37662#discussion_r955964731
##########
python/pyspark/sql/functions.py:
##########
@@ -2301,13 +2532,46 @@ def count_distinct(col: "ColumnOrName", *cols:
"ColumnOrName") -> Column:
.. versionadded:: 3.2.0
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ first column to compute on.
+ cols : :class:`~pyspark.sql.Column` or str
+ other columns to compute on.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ distinct values of these two column values.
+
Examples
--------
>>> df.agg(count_distinct(df.age, df.name).alias('c')).collect()
[Row(c=2)]
>>> df.agg(count_distinct("age", "name").alias('c')).collect()
[Row(c=2)]
Review Comment:
Removed
##########
python/pyspark/sql/functions.py:
##########
@@ -2329,13 +2593,34 @@ def first(col: "ColumnOrName", ignorenulls: bool =
False) -> Column:
The function is non-deterministic because its results depends on the order
of the
rows which may be non-deterministic after a shuffle.
+ Parameters
+ ----------
+ col : :class:`~pyspark.sql.Column` or str
+ column to fetch first value for.
+ ignorenulls : :class:`~pyspark.sql.Column` or str
+ if first value is null then look for first non-null value.
+
+ Returns
+ -------
+ :class:`~pyspark.sql.Column`
+ first value of the group.
+
Examples
--------
- >>> df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
+ >>> df = spark.createDataFrame([("Alice", 2), ("Bob", 5), ("Alice",
None)], ("name", "age"))
+ >>> df = df.orderBy(df.age)
>>> df.groupby("name").agg(first("age")).orderBy("name").show()
+-----+----------+
| name|first(age)|
+-----+----------+
+ |Alice| null|
+ | Bob| 5|
+ +-----+----------+
+
+ >>> df.groupby("name").agg(first("age", True)).orderBy("name").show()
Review Comment:
Added
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]