Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20487#discussion_r165714499
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1923,6 +1923,9 @@ def toPandas(self):
0 2 Alice
1 5 Bob
"""
+ from pyspark.sql.utils import require_minimum_pandas_version
--- End diff --
`toPandas` seems already failed when it includes types `TimestampType`:
```
>>> import datetime
>>> spark.createDataFrame([[datetime.datetime.now()]]).toPandas()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/dataframe.py", line 1978, in toPandas
_check_series_convert_timestamps_local_tz(pdf[field.name], timezone)
File "/.../spark/python/pyspark/sql/types.py", line 1775, in
_check_series_convert_timestamps_local_tz
return _check_series_convert_timestamps_localize(s, None, timezone)
File "/.../spark/python/pyspark/sql/types.py", line 1750, in
_check_series_convert_timestamps_localize
require_minimum_pandas_version()
File "/.../spark/python/pyspark/sql/utils.py", line 128, in
require_minimum_pandas_version
"your version was %s." % (minimum_pandas_version, pandas.__version__))
ImportError: Pandas >= 0.19.2 must be installed; however, your version was
0.16.0.
```
Since we set the supported version, I think we should better explicitly
require the version. Let me know if anyone thinks differently ..
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]