Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20487#discussion_r165865284
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1923,6 +1923,9 @@ def toPandas(self):
0 2 Alice
1 5 Bob
"""
+ from pyspark.sql.utils import require_minimum_pandas_version
--- End diff --
Ah, that's pyarrow vs this one is pandas. Wanted to produce a proper
message before `import pandas as pd` before :-).
Above case (https://github.com/apache/spark/pull/20487/files#r165714499) is
when Pandas is lower than 0.19.2. When Pandas is missing, it shows sth like:
```
>>> spark.range(1).toPandas()
```
before:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/dataframe.py", line 1975, in toPandas
import pandas as pd
ImportError: No module named pandas
```
after:
```
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/dataframe.py", line 1927, in toPandas
require_minimum_pandas_version()
File "/.../spark/python/pyspark/sql/utils.py", line 125, in
require_minimum_pandas_version
"it was not found." % minimum_pandas_version)
ImportError: Pandas >= 0.19.2 must be installed; however, it was not found.
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]