[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

HyukjinKwon Wed, 28 Jun 2017 02:12:51 -0700

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18444
  
    While I am here, I just ran some tests as below:
    
    ```python
    from array import array
    
    from pyspark.sql import Row
    
    
    spark.createDataFrame([Row(floatarray=array('f',[1, 2, 3]))]).show()
    spark.createDataFrame([Row(unicodearray=array('u',[u"a", u"b"]))]).show()
    ```
    
    Before
    
    ```python
    >>> spark.createDataFrame([Row(floatarray=array('f',[1,2,3]))]).show()
    ```
    
    ```
    +------------------+
    |        floatarray|
    +------------------+
    |[null, null, null]|
    +------------------+
    ```
    
    ```python
    >>> spark.createDataFrame([Row(unicodearray=array('u',[u"a", 
u"b"]))]).show()
    ```
    
    ```
    +------------+
    |unicodearray|
    +------------+
    |      [a, b]|
    +------------+
    ```
    
    After
    
    ```python
    >>> spark.createDataFrame([Row(floatarray=array('f',[1, 2, 3]))]).show()
    ```
    ```
    +------------------+
    |        floatarray|
    +------------------+
    |[null, null, null]|
    +------------------+
    ```
    
    It looks still filling `null` (but the type looks being `float` from 
`double`).
    
    ```python
    >>> spark.createDataFrame([Row(unicodearray=array('u',[u"a", 
u"b"]))]).show()
    ```
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/session.py", line 537, in 
createDataFrame
        rdd, schema = self._createFromLocal(map(prepare, data), schema)
      File ".../spark/python/pyspark/sql/session.py", line 401, in 
_createFromLocal
        struct = self._inferSchemaFromList(data)
      File "...spark/python/pyspark/sql/session.py", line 333, in 
_inferSchemaFromList
        schema = reduce(_merge_type, map(_infer_schema, data))
      File ".../spark/python/pyspark/sql/types.py", line 1009, in _infer_schema
        fields = [StructField(k, _infer_type(v), True) for k, v in items]
      File ".../spark/python/pyspark/sql/types.py", line 981, in _infer_type
        raise TypeError("not supported type: array(%s)" % obj.typecode)
    TypeError: not supported type: array(u)
    ```
    
    I think we should not drop this support. I guess the same thing would 
happen to `c` too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

Reply via email to