[GitHub] spark pull request: [SPARK-7243][SQL] Contingency Tables for DataF...

mengxr Sun, 03 May 2015 21:28:00 -0700

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5842#discussion_r29565642
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -889,6 +889,23 @@ def cov(self, col1, col2):
                 raise ValueError("col2 should be a string.")
             return self._jdf.stat().cov(col1, col2)
     
    +    def crosstab(self, col1, col2):
    +        """
    +        Computes a pair-wise frequency table of the given columns. Also 
known as a contingency
    +        table. The number of distinct values for each column should be 
less than 1e5. The first
    +        column of each row will be the distinct values of `col1` and the 
column names will be the
    --- End diff --
    
    Document the first column name. `1e5` -> `1e4`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7243][SQL] Contingency Tables for DataF...

Reply via email to