GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/22653
[SPARK-25659][PYTHON] Test type inference specification for createDataFrame
in PySpark
## What changes were proposed in this pull request?
This PR proposes to specify type inference and simple e2e tests. Looks we
are not cleanly testing those logics.
For instance, see
https://github.com/apache/spark/blob/08c76b5d39127ae207d9d1fff99c2551e6ce2581/python/pyspark/sql/types.py#L894-L905
Looks we intended to support datetime.time and None for type inference too
but it does not work:
```
>>> spark.createDataFrame([[datetime.time()]])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/session.py", line 751, in
createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/.../spark/python/pyspark/sql/session.py", line 432, in
_createFromLocal
data = [schema.toInternal(row) for row in data]
File "/.../spark/python/pyspark/sql/types.py", line 604, in toInternal
for f, v, c in zip(self.fields, obj, self._needConversion))
File "/.../spark/python/pyspark/sql/types.py", line 604, in <genexpr>
for f, v, c in zip(self.fields, obj, self._needConversion))
File "/.../spark/python/pyspark/sql/types.py", line 442, in toInternal
return self.dataType.toInternal(obj)
File "/.../spark/python/pyspark/sql/types.py", line 193, in toInternal
else time.mktime(dt.timetuple()))
AttributeError: 'datetime.time' object has no attribute 'timetuple'
>>> spark.createDataFrame([[None]])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/session.py", line 751, in
createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/.../spark/python/pyspark/sql/session.py", line 419, in
_createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
File "/.../python/pyspark/sql/session.py", line 353, in
_inferSchemaFromList
raise ValueError("Some of types cannot be determined after inferring")
ValueError: Some of types cannot be determined after inferring
```
## How was this patch tested?
Manual tests and unit tests were added.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-25659
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22653.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22653
----
commit 8fddb9dca26ce36be1f7eaf0d356bf78070486f9
Author: hyukjinkwon <gurwls223@...>
Date: 2018-10-06T09:26:04Z
Test type inference specification for createDataFrame in PySpark
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]