This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 40a9a6ef5b8 [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab ` 40a9a6ef5b8 is described below commit 40a9a6ef5b89f0c3d19db4a43b8a73decaa173c3 Author: Ruifeng Zheng <ruife...@apache.org> AuthorDate: Thu Nov 10 15:42:19 2022 +0800 [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab ` ### What changes were proposed in this pull request? remove the outdated comments ### Why are the changes needed? the limitations are not true after [reimplementation](https://github.com/apache/spark/pull/38340) ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? doc - only Closes #38579 from zhengruifeng/doc_crosstab. Lead-authored-by: Ruifeng Zheng <ruife...@apache.org> Co-authored-by: Ruifeng Zheng <ruife...@foxmail.com> Signed-off-by: Ruifeng Zheng <ruife...@apache.org> --- python/pyspark/sql/dataframe.py | 3 +-- .../src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala | 2 -- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py index 3c787f8900f..6d5014918bf 100644 --- a/python/pyspark/sql/dataframe.py +++ b/python/pyspark/sql/dataframe.py @@ -4217,8 +4217,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin): def crosstab(self, col1: str, col2: str) -> "DataFrame": """ Computes a pair-wise frequency table of the given columns. Also known as a contingency - table. The number of distinct values for each column should be less than 1e4. At most 1e6 - non-zero pair frequencies will be returned. + table. The first column of each row will be the distinct values of `col1` and the column names will be the distinct values of `col2`. The name of the first column will be `$col1_$col2`. Pairs that have no occurrences will have zero as their counts. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala index efd430633d7..7511c21fa76 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala @@ -181,8 +181,6 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) { /** * Computes a pair-wise frequency table of the given columns. Also known as a contingency table. - * The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero - * pair frequencies will be returned. * The first column of each row will be the distinct values of `col1` and the column names will * be the distinct values of `col2`. The name of the first column will be `col1_col2`. Counts * will be returned as `Long`s. Pairs that have no occurrences will have zero as their counts. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org