[GitHub] [spark] itholic commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

via GitHub Sun, 21 May 2023 19:30:46 -0700


itholic commented on code in PR #41211:
URL: https://github.com/apache/spark/pull/41211#discussion_r1199889980



##########
python/pyspark/pandas/tests/data_type_ops/test_date_ops.py:
##########
@@ -61,6 +63,10 @@ def test_add(self):
         for psser in self.pssers:
             self.assertRaises(TypeError, lambda: self.psser + psser)
 
+    @unittest.skipIf(
+        LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
+        "TODO(SPARK-43571): Enable DateOpsTests.test_sub for pandas 2.0.0.",
+    )

Review Comment:
   > "There should be no behavior changes unless it is a major release" where 
does this come from?
   
   AFAIK, it appears to be primarily intended to adhere to the "Considerations 
when breaking APIs" mentioned in [Versioning 
Policy](https://spark.apache.org/versioning-policy.html). It also serves as a 
rule followed uniformly across all components other than the pandas API on 
Spark.
   
   
   > The whole point of the pandas API on spark is that it should be as similar 
as possible to pandas and then there must also be some behavior changes, 
because it has happened in pandas.
   
   Yes, as @HyukjinKwon mentioned in 
https://github.com/apache/spark/pull/41211#discussion_r1197553147, that's why 
we need to discuss whether to allow breaking changes for some APIs in the 
pandas API on Spark exceptional compared to other components.
   
   > Are there any other ways than one person having to fix everything in one 
and the same PR? e.g. that you mark all tests that fail with 
@pytest.mark.skip(reason="see JIRA XXXX for updating to pandas 2.0") and you 
also create a JIRA for that. So that can more people can help upgrade the 
pandas API on spark to version 2.0?
   
   Yes, as I mentioned in 
https://github.com/apache/spark/pull/41211#discussion_r1197445641, I am 
currently skipping all failing tests and creating tickets to allow others to 
participate in the pandas upgrade. The current PR serves as the groundwork for 
that, and once the PR is completed, anyone who wishes to contribute can help 
with the pandas upgrade.
   
   After the PR is merged, anyone who wants to contribute to the pandas API on 
Spark can pick a ticket from the list at SPARK-42618.



##########
python/pyspark/pandas/tests/data_type_ops/test_date_ops.py:
##########
@@ -61,6 +63,10 @@ def test_add(self):
         for psser in self.pssers:
             self.assertRaises(TypeError, lambda: self.psser + psser)
 
+    @unittest.skipIf(
+        LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
+        "TODO(SPARK-43571): Enable DateOpsTests.test_sub for pandas 2.0.0.",
+    )

Review Comment:
   > "There should be no behavior changes unless it is a major release" where 
does this come from?
   
   AFAIK, it appears to be primarily intended to adhere to the "Considerations 
when breaking APIs" mentioned in [Versioning 
Policy](https://spark.apache.org/versioning-policy.html). It also serves as a 
rule followed uniformly across all components other than the pandas API on 
Spark.
   
   
   > The whole point of the pandas API on spark is that it should be as similar 
as possible to pandas and then there must also be some behavior changes, 
because it has happened in pandas.
   
   Yes, as @HyukjinKwon mentioned in 
https://github.com/apache/spark/pull/41211#discussion_r1197553147, that's why 
we need to discuss whether to allow breaking changes for some APIs in the 
pandas API on Spark exceptional compared to other components.
   
   > Are there any other ways than one person having to fix everything in one 
and the same PR? e.g. that you mark all tests that fail with 
@pytest.mark.skip(reason="see JIRA XXXX for updating to pandas 2.0") and you 
also create a JIRA for that. So that can more people can help upgrade the 
pandas API on spark to version 2.0?
   
   Yes, as I mentioned in 
https://github.com/apache/spark/pull/41211#discussion_r1197445641, I am 
currently skipping all failing tests and creating tickets to allow others to 
participate in the pandas upgrade. The current PR serves as the groundwork for 
that, and once the PR is completed, anyone who wishes to contribute can help 
with the pandas upgrade.
   
   After this PR is merged, anyone who wants to contribute to the pandas API on 
Spark can pick a ticket from the list at SPARK-42618.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itholic commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

Reply via email to