HyukjinKwon commented on code in PR #40459: URL: https://github.com/apache/spark/pull/40459#discussion_r1139257625
########## python/docs/source/migration_guide/pyspark_upgrade.rst: ########## @@ -33,6 +33,7 @@ Upgrading from PySpark 3.3 to 3.4 * In Spark 3.4, the ``Series.concat`` sort parameter will be respected to follow pandas 1.4 behaviors. * In Spark 3.4, the ``DataFrame.__setitem__`` will make a copy and replace pre-existing arrays, which will NOT be over-written to follow pandas 1.4 behaviors. * In Spark 3.4, the ``SparkSession.sql`` and the Pandas on Spark API ``sql`` have got new parameter ``args`` which provides binding of named parameters to their SQL literals. +* In Spark 3.4, Pandas-on-Spark supports for the upcoming pandas 2.0. As a result, some APIs that are deprecated or removed in pandas 2.0 also be affected in Spark 3.4. Please refer to the [official pandas release notes](https://pandas.pydata.org/docs/dev/whatsnew/) for more details. Review Comment: ```suggestion * In Spark 3.4, Pandas API on Spark follows for the pandas 2.0, and some APIs were deprecated or removed in Spark 3.4 according to the changes made in pandas 2.0. Please refer to the [release notes of pandas](https://pandas.pydata.org/docs/dev/whatsnew/) for more details. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
