panbingkun commented on code in PR #44864:
URL: https://github.com/apache/spark/pull/44864#discussion_r1467155696


##########
python/pyspark/pandas/__init__.py:
##########
@@ -35,7 +35,9 @@
 except ImportError as e:
     if os.environ.get("SPARK_TESTING"):
         warnings.warn(str(e))
-        sys.exit()
+        # Run test without pandas/pyarrow on PyPy

Review Comment:
   > This fix is related to #44778.
   > 
   > IIRC we once tried to skip PyPy from CI because `numpy`, `pandas`, and 
`pyarrow` are not available from PyPy, but we decided to only skip problematic 
doctest instead of skipping whole PyPy (discussed from [#44778 
(comment)](https://github.com/apache/spark/pull/44778#discussion_r1457628763)).
   > 
   > So, we un-skip PyPy CI from 
[9d905aa](https://github.com/apache/spark/commit/9d905aaf0591b4d0f57b2823c613efbf74ef23f5),
 but `assertDataFrameEqual` function still depend on `pyspark.pandas` which 
requires `pandas` and `pyarrow` as we can see in this fixed file 
(`pyspark/pandas/__init__.py`) so [multiple tests keep 
failing](https://github.com/panbingkun/spark/actions/runs/7634418170/job/20798310295#step:12:4443).
   > 
   > So what I'm trying to do here is that if the current running Python 
implementation is PyPy, allow running test without `pandas` and `pyarrow` which 
are unavailable from PyPy.
   > 
   > @panbingkun could you double-check if I understood correctly based on our 
recent discussion??
   
   At present, this situation is not `doctests`, but traditional testing; We 
have already skipped the doctests using ` # doctest: +SKIP`
   
   The root cause is that in the above tests, the `assertDataFrameEqual` method 
was used, which heavily relies on `pyarrow` `pandas`. If our environment does 
not install the above dependency, we will fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to