Github user gberger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19792#discussion_r156338937
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1083,7 +1083,8 @@ def _infer_schema(row):
             elif hasattr(row, "_fields"):  # namedtuple
                 items = zip(row._fields, tuple(row))
             else:
    -            names = ['_%d' % i for i in range(1, len(row) + 1)]
    +            if names is None:
    +                names = ['_%d' % i for i in range(1, len(row) + 1)]
    --- End diff --
    
    If we revert it, then the original purpose of the PR is lost:
    
    ```
    >>> df = pd.DataFrame(data={'a':[1,2,3], 'b': [4, 5, 'hello']})
    >>> spark.createDataFrame(df)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line 
646, in createDataFrame
        rdd, schema = self._createFromLocal(map(prepare, data), schema)
      File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line 
409, in _createFromLocal
        struct = self._inferSchemaFromList(data, names=schema)
      File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line 
341, in _inferSchemaFromList
        schema = reduce(_merge_type, (_infer_schema(row, names) for row in 
data))
      File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line 
1132, in _merge_type
        for f in a.fields]
      File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line 
1125, in _merge_type
        raise TypeError(new_msg("Can not merge type %s and %s" % (type(a), 
type(b))))
    TypeError: field _2: Can not merge type <class 
'pyspark.sql.types.LongType'> and <class 'pyspark.sql.types.StringType'>
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to