Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19792
Can we set the `names` for `_createFromRDD` -> `_inferSchema` too?:
```python
>>> spark.createDataFrame(spark.sparkContext.parallelize([[None, 1], ["a",
None], [1, 1]]), schema=["a", "b"], samplingRatio=0.99)
```
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/Users/hyukjinkwon/Desktop/workspace/repos/forked/spark/python/pyspark/sql/session.py",
line 644, in createDataFrame
rdd, schema = self._createFromRDD(data.map(prepare), schema,
samplingRatio)
File
"/Users/hyukjinkwon/Desktop/workspace/repos/forked/spark/python/pyspark/sql/session.py",
line 383, in _createFromRDD
struct = self._inferSchema(rdd, samplingRatio)
File
"/Users/hyukjinkwon/Desktop/workspace/repos/forked/spark/python/pyspark/sql/session.py",
line 375, in _inferSchema
schema = rdd.map(_infer_schema).reduce(_merge_type)
File "/.../spark/python/pyspark/rdd.py", line 852, in reduce
return reduce(f, vals)
File "/.../spark/python/pyspark/sql/types.py", line 1133, in _merge_type
for f in a.fields]
File "/.../spark/python/pyspark/sql/types.py", line 1126, in _merge_type
raise TypeError(new_msg("Can not merge type %s and %s" % (type(a),
type(b))))
TypeError: field _1: Can not merge type <class
'pyspark.sql.types.StringType'> and <class 'pyspark.sql.types.LongType'>
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]