xinrong-databricks commented on code in PR #36444:
URL: https://github.com/apache/spark/pull/36444#discussion_r864298316
##########
python/pyspark/pandas/groupby.py:
##########
@@ -640,6 +640,17 @@ def std(self, ddof: int = 1) -> FrameLike:
"""
assert ddof in (0, 1)
+ # Raise the TypeError when all aggregation columns are of unaccepted
data types
+ all_unaccepted = True
+ for _agg_col in self._agg_columns:
+ if isinstance(_agg_col.spark.data_type, (NumericType,
BooleanType)):
+ all_unaccepted = False
+ break
+ if all_unaccepted:
+ raise TypeError(
Review Comment:
pandas 1.4 behaves as below:
```
>>> pdf = pd.DataFrame(
... {
... "A": [1, 2, 1, 2],
... "B": [3.1, 4.1, 4.1, 3.1],
... "C": ["a", "b", "b", "a"],
... "D": [True, False, False, True],
... }
... )
>>> pdf.groupby('A')[['C']].std()
Traceback (most recent call last):
...
ValueError: could not convert string to float: 'a'
>>> pdf.groupby('A').std()
B D
A
1 0.707107 0.707107
2 0.707107 0.707107
```
I think `TypeError` is more appropriate than `ValueError` raised by pandas.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]