itholic commented on PR #42788:
URL: https://github.com/apache/spark/pull/42788#issuecomment-1714912254
So far, we don't follow the Pandas behavior since we couldn't support the
object-dtype for stat functions in some cases as beolw:
```python
# DataFrame
>>> pdf
A B
0 1 a
1 2 b
2 3 c
# Pandas works
>>> pdf.min(numeric_only=False)
A 1
B a
dtype: object
# Pandas API on Spark doesn't work
>>> ps.from_pandas(pdf).min(numeric_only=False)
...
pyarrow.lib.ArrowInvalid: Could not convert 'a' with type str: tried to
convert to int64
```
But on my second thought, it's a bug from our code in Pandas API on Spark so
we can support `numeric_only=False` as default by fixing the existing bug.
Let me just close this ticket, and change the default value instead.
Thanks for pointing out, @zhengruifeng !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]