Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19792#discussion_r156337091
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1083,7 +1083,8 @@ def _infer_schema(row):
             elif hasattr(row, "_fields"):  # namedtuple
                 items = zip(row._fields, tuple(row))
             else:
    -            names = ['_%d' % i for i in range(1, len(row) + 1)]
    +            if names is None:
    +                names = ['_%d' % i for i in range(1, len(row) + 1)]
    --- End diff --
    
    Ah, yup. I noticed it too but I think the same thing applies to other 
cases, for example:
    
    ```
    >>> spark.createDataFrame([{"a": 1}, {"a": []}], ["col1"])
    ...
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/session.py", line 646, in 
createDataFrame
        rdd, schema = self._createFromLocal(map(prepare, data), schema)
      File "/.../spark/python/pyspark/sql/session.py", line 409, in 
_createFromLocal
        struct = self._inferSchemaFromList(data, names=schema)
      File "/.../spark/python/pyspark/sql/session.py", line 341, in 
_inferSchemaFromList
        schema = reduce(_merge_type, (_infer_schema(row, names) for row in 
data))
      File "/.../spark/python/pyspark/sql/types.py", line 1133, in _merge_type
        for f in a.fields]
      File "/.../spark/python/pyspark/sql/types.py", line 1126, in _merge_type
        raise TypeError(new_msg("Can not merge type %s and %s" % (type(a), 
type(b))))
    TypeError: field a: Can not merge type <class 'pyspark.sql.types.LongType'> 
and <class 'pyspark.sql.types.ArrayType'>
    ```
    
    So, let's revert this change here for now. There are some subtleties here 
but I think it's fine.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to