Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r156333302
--- Diff: python/pyspark/sql/types.py ---
@@ -1083,7 +1083,8 @@ def _infer_schema(row):
elif hasattr(row, "_fields"): # namedtuple
items = zip(row._fields, tuple(row))
else:
- names = ['_%d' % i for i in range(1, len(row) + 1)]
+ if names is None:
+ names = ['_%d' % i for i in range(1, len(row) + 1)]
--- End diff --
You're right, but by reverting we lose the nice message. Notice below
reverted it says field _1, where it could have said field col1.
```
>>> spark.createDataFrame([["a", "b"], [1, 2]], ["col1"]).show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line
646, in createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line
409, in _createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line
341, in _inferSchemaFromList
schema = reduce(_merge_type, (_infer_schema(row, names) for row in
data))
File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line
1132, in _merge_type
for f in a.fields]
File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line
1125, in _merge_type
raise TypeError(new_msg("Can not merge type %s and %s" % (type(a),
type(b))))
TypeError: field _1: Can not merge type <class
'pyspark.sql.types.StringType'> and <class 'pyspark.sql.types.LongType'>
```
Instead, I am adding a new elif branch where we check len(names) vs
len(row). If we have fewer names than we have columns, we extend the names
list, completing it with entries such as _2.
```
>>> spark.createDataFrame([["a", "b"]], ["col1"])
DataFrame[col1: string, _2: string]
```
I have included a test for this as well.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]