Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20487#discussion_r166547502
--- Diff: python/pyspark/sql/session.py ---
@@ -646,6 +646,9 @@ def createDataFrame(self, data, schema=None,
samplingRatio=None, verifySchema=Tr
except Exception:
has_pandas = False
if has_pandas and isinstance(data, pandas.DataFrame):
+ from pyspark.sql.utils import require_minimum_pandas_version
+ require_minimum_pandas_version()
--- End diff --
I don't think I exactly know all the places exactly. For now, I can think
of: createDataFrame with Pandas DataFrame input, toPandas and pandas_udf for
APIs, and some places in `session.py` / `types.py` for internal methods like
`_check*` family or `*arrow*` or `*pandas*`.
I was thinking of working on putting those into a single module (file)
after 2.3.0. Will cc you and @ueshin there.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]