[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix repr behaviour for R...

HyukjinKwon Sun, 04 Feb 2018 04:47:15 -0800

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20503
  
    I meant things like this:
    
    ```python
    >>> from pyspark.sql import Row
    >>> RowClass = Row(1)
    >>> RowClass("a")
    Row(1='a')
    ```
    
    ```python
    >>> spark.createDataFrame([RowClass("a")])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/session.py", line 686, in 
createDataFrame
        rdd, schema = self._createFromLocal(map(prepare, data), schema)
      File "/.../spark/python/pyspark/sql/session.py", line 410, in 
_createFromLocal
        struct = self._inferSchemaFromList(data, names=schema)
      File "/.../spark/python/pyspark/sql/session.py", line 342, in 
_inferSchemaFromList
        schema = reduce(_merge_type, (_infer_schema(row, names) for row in 
data))
      File "/.../spark/python/pyspark/sql/session.py", line 342, in <genexpr>
        schema = reduce(_merge_type, (_infer_schema(row, names) for row in 
data))
      File "/.../spark/python/pyspark/sql/types.py", line 1099, in _infer_schema
        fields = [StructField(k, _infer_type(v), True) for k, v in items]
      File "/.../spark/python/pyspark/sql/types.py", line 407, in __init__
        assert isinstance(name, basestring), "field name should be string"
    AssertionError: field name should be string
    ```
    
    The reason I initially didn't suggest to use `str` is, it breaks `unicode` 
in Python 2 IIRC. For example,
    
    ```
    str(u"ì")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character u'\uc544' in 
position 0: ordinal not in range(128)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for R...

Reply via email to

[GitHub] spark issue #20503: [SPARK-23299][SQL][PYSPARK] Fix repr behaviour for R...