itholic commented on code in PR #42788:
URL: https://github.com/apache/spark/pull/42788#discussion_r1321310948
##########
python/pyspark/pandas/frame.py:
##########
@@ -9029,6 +9047,14 @@ def cov(self, min_periods: Optional[int] = None, ddof:
int = 1) -> "DataFrame":
"""
if not isinstance(ddof, int):
raise TypeError("ddof must be integer")
+ if numeric_only is None:
Review Comment:
Yeah, IMHO we should always recommend to use the `numeric_only` parameter
explicitly to prevent future confusion, because the default value of the
`numeric_only` parameter is still different from Pandas even if the result is
the same for some cases.
##########
python/pyspark/pandas/groupby.py:
##########
@@ -608,18 +608,18 @@ def max(self, numeric_only: Optional[bool] = False,
min_count: int = -1) -> Fram
min_count=min_count,
)
- def mean(self, numeric_only: Optional[bool] = True) -> FrameLike:
+ def mean(self, numeric_only: Optional[bool] = None) -> FrameLike:
"""
Compute mean of groups, excluding missing values.
Parameters
----------
- numeric_only : bool, default True
+ numeric_only : bool, default None
Include only float, int, boolean columns. If None, will attempt to
use
everything, then use only numeric data. False is not supported.
This parameter is mainly for pandas compatibility.
- .. versionadded:: 3.4.0
+ .. versionchanged:: 4.0.0
Review Comment:
Makes sense to me. Updated!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]