zhengruifeng commented on code in PR #37945:
URL: https://github.com/apache/spark/pull/37945#discussion_r976089939
##########
python/pyspark/pandas/series.py:
##########
@@ -3333,29 +3348,74 @@ def corr(self, other: "Series", method: str =
"pearson") -> float:
... 's2': [.3, .6, .0, .1]})
>>> s1 = df.s1
>>> s2 = df.s2
- >>> s1.corr(s2, method='pearson') # doctest: +ELLIPSIS
- -0.851064...
+ >>> s1.corr(s2, method='pearson')
+ -0.85106...
- >>> s1.corr(s2, method='spearman') # doctest: +ELLIPSIS
- -0.948683...
+ >>> s1.corr(s2, method='spearman')
+ -0.94868...
- Notes
- -----
- There are behavior differences between pandas-on-Spark and pandas.
Review Comment:
the previous differences here are all resolved.
the last difference is that `method` in Pandas accepts a `Callable`...
##########
python/pyspark/pandas/series.py:
##########
@@ -3312,16 +3318,25 @@ def autocorr(self, periods: int = 1) -> float:
)
return np.nan if corr is None else corr
- def corr(self, other: "Series", method: str = "pearson") -> float:
+ def corr(
+ self, other: "Series", method: str = "pearson", min_periods:
Optional[int] = None
+ ) -> float:
"""
Compute correlation with `other` Series, excluding missing values.
+ .. versionadded:: 3.3.0
+
Parameters
----------
other : Series
- method : {'pearson', 'spearman'}
+ method : {'pearson', 'spearman', 'kendall'}
* pearson : standard correlation coefficient
* spearman : Spearman rank correlation
+ * kendall : Kendall Tau correlation coefficient
Review Comment:
nice!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]