[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

via GitHub Fri, 19 May 2023 06:14:52 -0700


bjornjorgensen commented on code in PR #41211:
URL: https://github.com/apache/spark/pull/41211#discussion_r1198937746



##########
python/pyspark/pandas/tests/data_type_ops/test_date_ops.py:
##########
@@ -61,6 +63,10 @@ def test_add(self):
         for psser in self.pssers:
             self.assertRaises(TypeError, lambda: self.psser + psser)
 
+    @unittest.skipIf(
+        LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
+        "TODO(SPARK-43571): Enable DateOpsTests.test_sub for pandas 2.0.0.",
+    )

Review Comment:
   eh I'm a bit puzzled and wonder if there is a misunderstanding of the 
language here.
   "There should be no behavior changes unless it is a major release" where 
does this come from?
   What @gatorsmile said was "we should not remove these API before the major 
release Spark 4.0" 
   did he mean that functions that are well-functioning but have been removed 
in pandas version 2.0 should not be removed? He has also written this under the 
`def append` which was meant to be removed in your previous PR.
   The whole point of the `pandas API on spark` is that it should be as similar 
as possible to `pandas` and then there must also be some behavior changes, 
because it has happened in pandas.
   Are there any other ways than one person having to fix everything in one and 
the same PR? e.g. that you mark all tests that fail with 
`@pytest.mark.skip(reason="see JIRA XXXX for updating to pandas 2.0")` and you 
also create a JIRA for that. So that can more people can help upgrade the 
`pandas API to version 2.0`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #41211: [SPARK-43024][PYTHON] Upgrade pandas to 2.0.0

Reply via email to