[GitHub] [spark] bjornjorgensen commented on pull request #41812: [SPARK-44267][PS][INFRA] Upgrade `pandas` to 2.0.3

via GitHub Mon, 10 Jul 2023 13:17:12 -0700


bjornjorgensen commented on PR #41812:
URL: https://github.com/apache/spark/pull/41812#issuecomment-1629670914


   Ehh.. this is more confusing each time that I am reading it.
   
   "My point was that we should focus on supporting the latest version of 
pandas rather than pandas 1.5.3 anyway."
   No, we are soon going to release `Apache Spark 3.5`. And we need to make 
that a great release.
   
   Some `pandas API on spark` functions rely on `pandas` like `.info()`
   `.info()` have changed a lot from `pandas` version 1.5.3 to 2.0.3. We don't 
have any tests for this.
   
   Users need to install a `pandas` version to use `pandas API on spark`.
   If we are going to have them (users) install `pandas` version 2.0.3, and we 
are only supporting `pandas version 1.5.3 on spark` then users that are using 
functions like `.to_pandas()` will then have to use `pandas` version 2.0.3 
   
   What if we make one PR where we reverse this PR and the one that updated to 
`pandas` 2.0.2 and some others, so we are back to `pandas` 1.5.3. And right 
after we release Apache Spark 3.5.0 we reverse that PR. Can that be a solution? 
 
   
   
       


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] bjornjorgensen commented on pull request #41812: [SPARK-44267][PS][INFRA] Upgrade `pandas` to 2.0.3

Reply via email to