bjornjorgensen commented on PR #41812:
URL: https://github.com/apache/spark/pull/41812#issuecomment-1629670914
Ehh.. this is more confusing each time that I am reading it.
"My point was that we should focus on supporting the latest version of
pandas rather than pandas 1.5.3 anyway."
No, we are soon going to release `Apache Spark 3.5`. And we need to make
that a great release.
Some `pandas API on spark` functions rely on `pandas` like `.info()`
`.info()` have changed a lot from `pandas` version 1.5.3 to 2.0.3. We don't
have any tests for this.
Users need to install a `pandas` version to use `pandas API on spark`.
If we are going to have them (users) install `pandas` version 2.0.3, and we
are only supporting `pandas version 1.5.3 on spark` then users that are using
functions like `.to_pandas()` will then have to use `pandas` version 2.0.3
What if we make one PR where we reverse this PR and the one that updated to
`pandas` 2.0.2 and some others, so we are back to `pandas` 1.5.3. And right
after we release Apache Spark 3.5.0 we reverse that PR. Can that be a solution?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]