amaliujia commented on code in PR #38938:
URL: https://github.com/apache/spark/pull/38938#discussion_r1041691534
##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1239,6 +1239,16 @@ def summary(self, *statistics: str) -> "DataFrame":
session=self._session,
)
+ def describe(self, *cols: str) -> "DataFrame":
+ _cols: List[str] = list(cols)
Review Comment:
```suggestion
"""Computes basic statistics for numeric and string columns
.. versionadded:: 3.4.0
This include count, mean, stddev, min, and max. If no columns are
given, this function computes statistics for all numerical or string
columns.
Notes
-----
This function is meant for exploratory data analysis, as we make no
guarantee about the backward compatibility of the schema of the
resulting
:class:`DataFrame`.
Use summary for expanded statistics and control over which
statistics to compute.
Parameters
----------
cols : str, list, optional
Column name or list of column names to describe by (default All
columns).
Returns
-------
:class:`DataFrame`
A new DataFrame that describes (provides statistics) given
DataFrame.
"""
_cols: List[str] = list(cols)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]