Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19646#discussion_r148782931
--- Diff: python/pyspark/sql/session.py ---
@@ -512,9 +557,7 @@ def createDataFrame(self, data, schema=None,
samplingRatio=None, verifySchema=Tr
except Exception:
has_pandas = False
if has_pandas and isinstance(data, pandas.DataFrame):
- if schema is None:
- schema = [str(x) for x in data.columns]
- data = [r.tolist() for r in data.to_records(index=False)]
--- End diff --
but ... `numpy.datetime64` is not supported in `createDataFrame` IIUC:
```python
import pandas as pd
from datetime import datetime
pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)]})
print [[v for v in r] for r in pdf.to_records(index=False)]
spark.createDataFrame([[v for v in r] for r in pdf.to_records(index=False)])
```
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/session.py", line 591, in
createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/.../spark/python/pyspark/sql/session.py", line 404, in
_createFromLocal
struct = self._inferSchemaFromList(data)
File "/.../spark/python/pyspark/sql/session.py", line 336, in
_inferSchemaFromList
schema = reduce(_merge_type, map(_infer_schema, data))
File "/.../spark/python/pyspark/sql/types.py", line 1095, in _infer_schema
fields = [StructField(k, _infer_type(v), True) for k, v in items]
File "/.../spark/python/pyspark/sql/types.py", line 1072, in _infer_type
raise TypeError("not supported type: %s" % type(obj))
TypeError: not supported type: <type 'numpy.datetime64'>
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]