itholic commented on code in PR #44778:
URL: https://github.com/apache/spark/pull/44778#discussion_r1464191332
##########
python/pyspark/sql/tests/test_udf.py:
##########
@@ -917,6 +923,7 @@ def test_complex_return_types(self):
self.assertEqual(row[1], {"a": "b"})
self.assertEqual(row[2], Row(col1=1, col2=2))
+ @unittest.skipIf(not have_pyarrow, pyarrow_requirement_message)
def test_named_arguments(self):
Review Comment:
Hmm... it seems like actually `assertDataFrameEqual` is not depend on
pandas??
```python
>>> import pandas
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas'
>>> from pyspark.testing.utils import assertDataFrameEqual
>>> df1 = spark.range(10)
>>> df2 = spark.range(10)
>>> assertDataFrameEqual(df1, df2)
```
It works well without pandas. In the code, we skip using pandas if it's not
installed:
```python
has_pandas = False
try:
# If pandas dependencies are available, allow pandas or
pandas-on-Spark DataFrame
import pyspark.pandas as ps
import pandas as pd
from pyspark.testing.pandasutils import PandasOnSparkTestUtils
has_pandas = True
except ImportError:
# no pandas, so we won't call pandasutils functions
pass
```
> After testing, we found that the assertDataFrameEqual method used in this
UT requires it.
Did you test with uninstalling `pandas` clearly? we should uninstall
`pandas-stubs` as well before uninstalling `pandas`. See
https://github.com/apache/spark/pull/44745#issuecomment-1894850179 more detail.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]