Github user gberger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19792#discussion_r156338937
--- Diff: python/pyspark/sql/types.py ---
@@ -1083,7 +1083,8 @@ def _infer_schema(row):
elif hasattr(row, "_fields"): # namedtuple
items = zip(row._fields, tuple(row))
else:
- names = ['_%d' % i for i in range(1, len(row) + 1)]
+ if names is None:
+ names = ['_%d' % i for i in range(1, len(row) + 1)]
--- End diff --
If we revert it, then the original purpose of the PR is lost:
```
>>> df = pd.DataFrame(data={'a':[1,2,3], 'b': [4, 5, 'hello']})
>>> spark.createDataFrame(df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line
646, in createDataFrame
rdd, schema = self._createFromLocal(map(prepare, data), schema)
File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line
409, in _createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line
341, in _inferSchemaFromList
schema = reduce(_merge_type, (_infer_schema(row, names) for row in
data))
File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line
1132, in _merge_type
for f in a.fields]
File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line
1125, in _merge_type
raise TypeError(new_msg("Can not merge type %s and %s" % (type(a),
type(b))))
TypeError: field _2: Can not merge type <class
'pyspark.sql.types.LongType'> and <class 'pyspark.sql.types.StringType'>
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]