Github user gberger commented on the issue:
https://github.com/apache/spark/pull/19792
@ueshin
The reason that I modified the case for StructType is that, in
session.py#341, for each Pandas DF row we obtain a StructType with StructFields
mapping column names to value type; these are reduced with _merge_types.
I do appreciate that it could be the case that a Pandas DF contains lists
or dicts as values. I pushed a new commit where the `name` property gets passed
down when we are recursing via ArrayType or MapType.
Here is what it looks like when we use lists or dicts inside a Pandas DF:
```
>>> df = pd.DataFrame(data={
... 'a': [[1, 2], [3, 4]],
... 'b': [[5, 'hello'], [7, 8]]
... })
>>> sdf = sql.createDataFrame(df)
>>> sdf
DataFrame[a: array<bigint>, b: array<bigint>]
>>> sdf.show()
+------+---------+
| a| b|
+------+---------+
|[1, 2]|[5, null]|
|[3, 4]| [7, 8]|
+------+---------+
```
```
>>> df = pd.DataFrame(data={
... 'a': [{1: 2}, {3: 4}],
... 'b': [{5: 'hello'}, {7: 8}]
... })
>>> sdf = sql.createDataFrame(df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/gberger/Projects/spark/python/pyspark/sql/context.py", line
354, in createDataFrame
return self.sparkSession.createDataFrame(data, schema, samplingRatio,
verifySchema)
File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line
646, in createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line
409, in _createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line
341, in _inferSchemaFromList
schema = reduce(_merge_type, [_infer_schema(row, names) for row in
data])
File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line
1128, in _merge_type
for f in a.fields]
File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line
1128, in <listcomp>
for f in a.fields]
File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line
1140, in _merge_type
_merge_type(a.valueType, b.valueType, name=name),
File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line
1122, in _merge_type
raise TypeError("Can not merge type %s and %s in field '%s'" %
(type(a), type(b), name))
TypeError: Can not merge type <class 'pyspark.sql.types.StringType'> and
<class 'pyspark.sql.types.LongType'> in field b
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]