[GitHub] [spark] itholic commented on a diff in pull request #42788: [SPARK-43291][PS] Generate proper warning on different behavior with `numeric_only`

via GitHub Mon, 11 Sep 2023 03:08:14 -0700


itholic commented on code in PR #42788:
URL: https://github.com/apache/spark/pull/42788#discussion_r1321310948



##########
python/pyspark/pandas/frame.py:
##########
@@ -9029,6 +9047,14 @@ def cov(self, min_periods: Optional[int] = None, ddof: 
int = 1) -> "DataFrame":
         """
         if not isinstance(ddof, int):
             raise TypeError("ddof must be integer")
+        if numeric_only is None:

Review Comment:
   Yeah, IMHO we should always recommend to use the `numeric_only` parameter 
explicitly to prevent future confusion, because the default value of the 
`numeric_only` parameter is still different from Pandas even if the result is 
the same for some cases.



##########
python/pyspark/pandas/groupby.py:
##########
@@ -608,18 +608,18 @@ def max(self, numeric_only: Optional[bool] = False, 
min_count: int = -1) -> Fram
             min_count=min_count,
         )
 
-    def mean(self, numeric_only: Optional[bool] = True) -> FrameLike:
+    def mean(self, numeric_only: Optional[bool] = None) -> FrameLike:
         """
         Compute mean of groups, excluding missing values.
 
         Parameters
         ----------
-        numeric_only : bool, default True
+        numeric_only : bool, default None
             Include only float, int, boolean columns. If None, will attempt to 
use
             everything, then use only numeric data. False is not supported.
             This parameter is mainly for pandas compatibility.
 
-            .. versionadded:: 3.4.0
+            .. versionchanged:: 4.0.0

Review Comment:
   Makes sense to me. Updated!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itholic commented on a diff in pull request #42788: [SPARK-43291][PS] Generate proper warning on different behavior with `numeric_only`

Reply via email to