[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36444: [SPARK-39095][PYTHON] Adjust `GroupBy.std` to match pandas 1.4

GitBox Tue, 03 May 2022 15:11:29 -0700


xinrong-databricks commented on code in PR #36444:
URL: https://github.com/apache/spark/pull/36444#discussion_r864298316



##########
python/pyspark/pandas/groupby.py:
##########
@@ -640,6 +640,17 @@ def std(self, ddof: int = 1) -> FrameLike:
         """
         assert ddof in (0, 1)
 
+        # Raise the TypeError when all aggregation columns are of unaccepted 
data types
+        all_unaccepted = True
+        for _agg_col in self._agg_columns:
+            if isinstance(_agg_col.spark.data_type, (NumericType, 
BooleanType)):
+                all_unaccepted = False
+                break
+        if all_unaccepted:
+            raise TypeError(

Review Comment:
   pandas 1.4 behaves as below:
   ```
   >>> pdf = pd.DataFrame(
   ...             {
   ...                 "A": [1, 2, 1, 2],
   ...                 "B": [3.1, 4.1, 4.1, 3.1],
   ...                 "C": ["a", "b", "b", "a"],
   ...                 "D": [True, False, False, True],
   ...             }
   ...         )
   >>> pdf.groupby('A')[['C']].std()
   Traceback (most recent call last):
   ...
   ValueError: could not convert string to float: 'a'
   >>> pdf.groupby('A').std()
             B         D
   A                    
   1  0.707107  0.707107
   2  0.707107  0.707107
   ```
   
   I think `TypeError` is more appropriate than `ValueError` raised by pandas.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36444: [SPARK-39095][PYTHON] Adjust `GroupBy.std` to match pandas 1.4

Reply via email to