[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

icexelloss Fri, 06 Oct 2017 06:03:19 -0700

Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/18664
  
    Thanks @gatorsmile for the constructive feedback!
    
    I don't want to make this more complicated but I also want to make sure we 
are aware that there is also difference between Arrow/non-Arrow version when 
treating array and sstruct type:
    
    Array:
    ```
    non-Arrow:
    In [47]: type(df2.toPandas().array[0])
    Out[47]: list
    
    Arrow:
    In [45]: type(df2.toPandas().array[0])
    Out[45]: numpy.ndarray
    ```
    
    Struct:
    ```
    Arrow:
    In [35]: type(df.toPandas().struct[0])
    Out[35]: pyspark.sql.types.Row
    
    non-Arrow:
    In [37]: type(df.toPandas().struct[0])
    Out[37]: dict
    ```
    
    I think there should be a high level doc capturing all differences between 
Arrow/non-Arrow version. 
    
    Unfortunately I cannot commit much time until Nov but I am happy for help 
with review and discussion.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

Reply via email to