[spark] branch master updated: [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab `

ruifengz Wed, 09 Nov 2022 23:42:52 -0800

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 40a9a6ef5b8 [SPARK-40877][DOC][FOLLOW-UP] Update the doc of 
`DataFrame.stat.crosstab `
40a9a6ef5b8 is described below

commit 40a9a6ef5b89f0c3d19db4a43b8a73decaa173c3
Author: Ruifeng Zheng <ruife...@apache.org>
AuthorDate: Thu Nov 10 15:42:19 2022 +0800

    [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab `
    
    ### What changes were proposed in this pull request?
    remove the outdated comments
    
    ### Why are the changes needed?
    the limitations are not true after 
[reimplementation](https://github.com/apache/spark/pull/38340)
    
    ### Does this PR introduce _any_ user-facing change?
    yes
    
    ### How was this patch tested?
    doc - only
    
    Closes #38579 from zhengruifeng/doc_crosstab.
    
    Lead-authored-by: Ruifeng Zheng <ruife...@apache.org>
    Co-authored-by: Ruifeng Zheng <ruife...@foxmail.com>
    Signed-off-by: Ruifeng Zheng <ruife...@apache.org>
---
 python/pyspark/sql/dataframe.py                                        | 3 +--
 .../src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala   | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 3c787f8900f..6d5014918bf 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -4217,8 +4217,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
     def crosstab(self, col1: str, col2: str) -> "DataFrame":
         """
         Computes a pair-wise frequency table of the given columns. Also known 
as a contingency
-        table. The number of distinct values for each column should be less 
than 1e4. At most 1e6
-        non-zero pair frequencies will be returned.
+        table.
         The first column of each row will be the distinct values of `col1` and 
the column names
         will be the distinct values of `col2`. The name of the first column 
will be `$col1_$col2`.
         Pairs that have no occurrences will have zero as their counts.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
index efd430633d7..7511c21fa76 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
@@ -181,8 +181,6 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
 
   /**
    * Computes a pair-wise frequency table of the given columns. Also known as 
a contingency table.
-   * The number of distinct values for each column should be less than 1e4. At 
most 1e6 non-zero
-   * pair frequencies will be returned.
    * The first column of each row will be the distinct values of `col1` and 
the column names will
    * be the distinct values of `col2`. The name of the first column will be 
`col1_col2`. Counts
    * will be returned as `Long`s. Pairs that have no occurrences will have 
zero as their counts.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab `

Reply via email to