[GitHub] [spark] itholic commented on pull request #42788: [SPARK-43291][PS] Generate proper warning on different behavior with `numeric_only`

via GitHub Mon, 11 Sep 2023 20:36:00 -0700


itholic commented on PR #42788:
URL: https://github.com/apache/spark/pull/42788#issuecomment-1714912254


   So far, we don't follow the Pandas behavior since we couldn't support the 
object-dtype for stat functions in some cases as beolw:
   ```python
   # DataFrame
   >>> pdf
      A  B
   0  1  a
   1  2  b
   2  3  c
   
   # Pandas works
   >>> pdf.min(numeric_only=False)
   A    1
   B    a
   dtype: object
   
   # Pandas API on Spark doesn't work
   >>> ps.from_pandas(pdf).min(numeric_only=False)
   ...
   pyarrow.lib.ArrowInvalid: Could not convert 'a' with type str: tried to 
convert to int64
   ```
   
   But on my second thought, it's a bug from our code in Pandas API on Spark so 
we can support `numeric_only=False` as default by fixing the existing bug.
   
   Let me just close this ticket, and change the default value instead.
   
   Thanks for pointing out, @zhengruifeng !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itholic commented on pull request #42788: [SPARK-43291][PS] Generate proper warning on different behavior with `numeric_only`

Reply via email to