Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r156337091
--- Diff: python/pyspark/sql/types.py ---
@@ -1083,7 +1083,8 @@ def _infer_schema(row):
elif hasattr(row, "_fields"): # namedtuple
items = zip(row._fields, tuple(row))
else:
- names = ['_%d' % i for i in range(1, len(row) + 1)]
+ if names is None:
+ names = ['_%d' % i for i in range(1, len(row) + 1)]
--- End diff --
Ah, yup. I noticed it too but I think the same thing applies to other
cases, for example:
```
>>> spark.createDataFrame([{"a": 1}, {"a": []}], ["col1"])
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/session.py", line 646, in
createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/.../spark/python/pyspark/sql/session.py", line 409, in
_createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
File "/.../spark/python/pyspark/sql/session.py", line 341, in
_inferSchemaFromList
schema = reduce(_merge_type, (_infer_schema(row, names) for row in
data))
File "/.../spark/python/pyspark/sql/types.py", line 1133, in _merge_type
for f in a.fields]
File "/.../spark/python/pyspark/sql/types.py", line 1126, in _merge_type
raise TypeError(new_msg("Can not merge type %s and %s" % (type(a),
type(b))))
TypeError: field a: Can not merge type <class 'pyspark.sql.types.LongType'>
and <class 'pyspark.sql.types.ArrayType'>
```
So, let's revert this change here for now. There are some subtleties here
but I think it's fine.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]