Re: [PR] [SPARK-46824][PS][BUILD] Enable Pandas-on-Spark test without optional dependency on PyPy [spark]

via GitHub Wed, 24 Jan 2024 00:23:17 -0800


itholic commented on code in PR #44864:
URL: https://github.com/apache/spark/pull/44864#discussion_r1464499226



##########
python/pyspark/pandas/__init__.py:
##########
@@ -35,7 +35,9 @@
 except ImportError as e:
     if os.environ.get("SPARK_TESTING"):
         warnings.warn(str(e))
-        sys.exit()
+        # Run test without pandas/pyarrow on PyPy

Review Comment:
   This fix is related to https://github.com/apache/spark/pull/44778.
   
   IIRC we once tried to skip PyPy from CI because `numpy`, `pandas`, and 
`pyarrow` are not available from PyPy, but we decided to only skip problematic 
doctest instead of skipping whole PyPy (discussed from 
https://github.com/apache/spark/pull/44778#discussion_r1457628763).
   
   So, we un-skip PyPy CI from 
https://github.com/apache/spark/commit/9d905aaf0591b4d0f57b2823c613efbf74ef23f5,
 but `assertDataFrameEqual` function still depend on `pyspark.pandas` which 
requires `pandas` and `pyarrow` as we can see in this fixed file 
(`pyspark/pandas/__init__.py`) so the [test keep 
failing](https://github.com/panbingkun/spark/actions/runs/7634418170/job/20798310295#step:12:4443).
   
   So what I'm trying to do here is that if the current running Python 
implementation is PyPy, allow running test without `pandas` and `pyarrow` which 
are unavailable from PyPy.
   
   @panbingkun could you double-check if I understood correctly based on our 
recent discussion??



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46824][PS][BUILD] Enable Pandas-on-Spark test without optional dependency on PyPy [spark]

Reply via email to