itholic commented on code in PR #44864:
URL: https://github.com/apache/spark/pull/44864#discussion_r1464499226
##########
python/pyspark/pandas/__init__.py:
##########
@@ -35,7 +35,9 @@
except ImportError as e:
if os.environ.get("SPARK_TESTING"):
warnings.warn(str(e))
- sys.exit()
+ # Run test without pandas/pyarrow on PyPy
Review Comment:
This fix is related to https://github.com/apache/spark/pull/44778.
IIRC we once tried to skip PyPy from CI because `numpy`, `pandas`, and
`pyarrow` are not available PyPy, but we decided to only skip problematic
doctest instead of skipping whole PyPy (discussed from
https://github.com/apache/spark/pull/44778#discussion_r1457628763).
So, we un-skip PyPy CI from
https://github.com/apache/spark/commit/9d905aaf0591b4d0f57b2823c613efbf74ef23f5,
but `assertDataFrameEqual` function still depend on `pyspark.pandas` which
requires `pandas` and `pyarrow` as we can see in this fixed file
(`pyspark/pandas/__init__.py`) so the [test keep
failing](https://github.com/panbingkun/spark/actions/runs/7634418170/job/20798310295#step:12:4443).
So what I'm trying to do here is that if the current running Python
implementation is PyPy, allow running test without `pandas` and `pyarrow` which
are unavailable from PyPy.
@panbingkun could you double-check if I understood correctly based on our
recent discussion??
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]