[GitHub] [spark] amaliujia commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

GitBox Tue, 06 Dec 2022 18:36:50 -0800


amaliujia commented on code in PR #38938:
URL: https://github.com/apache/spark/pull/38938#discussion_r1041691534



##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -1239,6 +1239,16 @@ def summary(self, *statistics: str) -> "DataFrame":
             session=self._session,
         )
 
+    def describe(self, *cols: str) -> "DataFrame":
+        _cols: List[str] = list(cols)

Review Comment:
   ```suggestion
           """Computes basic statistics for numeric and string columns
   
           .. versionadded:: 3.4.0
   
           This include count, mean, stddev, min, and max. If no columns are
           given, this function computes statistics for all numerical or string 
columns.
   
           Notes
           -----
           This function is meant for exploratory data analysis, as we make no
           guarantee about the backward compatibility of the schema of the 
resulting
           :class:`DataFrame`.
           Use summary for expanded statistics and control over which 
statistics to compute.
   
           Parameters
           ----------
           cols : str, list, optional
                Column name or list of column names to describe by (default All 
columns).
   
           Returns
           -------
           :class:`DataFrame`
               A new DataFrame that describes (provides statistics) given 
DataFrame.
           """
           _cols: List[str] = list(cols)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] amaliujia commented on a diff in pull request #38938: [WIP][SPARK-41403][CONNECT][PYTHON] Implement `DataFrame.describe`

Reply via email to