Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/5842#discussion_r29565642
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -889,6 +889,23 @@ def cov(self, col1, col2):
raise ValueError("col2 should be a string.")
return self._jdf.stat().cov(col1, col2)
+ def crosstab(self, col1, col2):
+ """
+ Computes a pair-wise frequency table of the given columns. Also
known as a contingency
+ table. The number of distinct values for each column should be
less than 1e5. The first
+ column of each row will be the distinct values of `col1` and the
column names will be the
--- End diff --
Document the first column name. `1e5` -> `1e4`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]