itholic commented on code in PR #41211:
URL: https://github.com/apache/spark/pull/41211#discussion_r1199889980
##########
python/pyspark/pandas/tests/data_type_ops/test_date_ops.py:
##########
@@ -61,6 +63,10 @@ def test_add(self):
for psser in self.pssers:
self.assertRaises(TypeError, lambda: self.psser + psser)
+ @unittest.skipIf(
+ LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
+ "TODO(SPARK-43571): Enable DateOpsTests.test_sub for pandas 2.0.0.",
+ )
Review Comment:
> "There should be no behavior changes unless it is a major release" where
does this come from?
AFAIK, it appears to be primarily intended to adhere to the "Considerations
when breaking APIs" mentioned in [Versioning
Policy](https://spark.apache.org/versioning-policy.html). It also serves as a
rule followed uniformly across all components other than the pandas API on
Spark.
> The whole point of the pandas API on spark is that it should be as similar
as possible to pandas and then there must also be some behavior changes,
because it has happened in pandas.
Yes, as @HyukjinKwon mentioned in
https://github.com/apache/spark/pull/41211#discussion_r1197553147, that's why
we need to discuss whether to allow breaking changes for some APIs in the
pandas API on Spark exceptional compared to other components.
> Are there any other ways than one person having to fix everything in one
and the same PR? e.g. that you mark all tests that fail with
@pytest.mark.skip(reason="see JIRA XXXX for updating to pandas 2.0") and you
also create a JIRA for that. So that can more people can help upgrade the
pandas API on spark to version 2.0?
Yes, as I mentioned in
https://github.com/apache/spark/pull/41211#discussion_r1197445641, I am
currently skipping all failing tests and creating tickets to allow others to
participate in the pandas upgrade. The current PR serves as the groundwork for
that, and once the PR is completed, anyone who wishes to contribute can help
with the pandas upgrade.
After the PR is merged, anyone who wants to contribute to the pandas API on
Spark can pick a ticket from the list at SPARK-42618.
##########
python/pyspark/pandas/tests/data_type_ops/test_date_ops.py:
##########
@@ -61,6 +63,10 @@ def test_add(self):
for psser in self.pssers:
self.assertRaises(TypeError, lambda: self.psser + psser)
+ @unittest.skipIf(
+ LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
+ "TODO(SPARK-43571): Enable DateOpsTests.test_sub for pandas 2.0.0.",
+ )
Review Comment:
> "There should be no behavior changes unless it is a major release" where
does this come from?
AFAIK, it appears to be primarily intended to adhere to the "Considerations
when breaking APIs" mentioned in [Versioning
Policy](https://spark.apache.org/versioning-policy.html). It also serves as a
rule followed uniformly across all components other than the pandas API on
Spark.
> The whole point of the pandas API on spark is that it should be as similar
as possible to pandas and then there must also be some behavior changes,
because it has happened in pandas.
Yes, as @HyukjinKwon mentioned in
https://github.com/apache/spark/pull/41211#discussion_r1197553147, that's why
we need to discuss whether to allow breaking changes for some APIs in the
pandas API on Spark exceptional compared to other components.
> Are there any other ways than one person having to fix everything in one
and the same PR? e.g. that you mark all tests that fail with
@pytest.mark.skip(reason="see JIRA XXXX for updating to pandas 2.0") and you
also create a JIRA for that. So that can more people can help upgrade the
pandas API on spark to version 2.0?
Yes, as I mentioned in
https://github.com/apache/spark/pull/41211#discussion_r1197445641, I am
currently skipping all failing tests and creating tickets to allow others to
participate in the pandas upgrade. The current PR serves as the groundwork for
that, and once the PR is completed, anyone who wishes to contribute can help
with the pandas upgrade.
After this PR is merged, anyone who wants to contribute to the pandas API on
Spark can pick a ticket from the list at SPARK-42618.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]