[GitHub] spark pull request #18403: [SPARK-21193][PYTHON] Specify Pandas version in s...

HyukjinKwon Fri, 23 Jun 2017 03:50:22 -0700

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/18403


    [SPARK-21193][PYTHON] Specify Pandas version in setup.py

    ## What changes were proposed in this pull request?
    
    It looks we missed specifying the Pandas version. This PR proposes to fix 
it. For the current state, it should be Pandas 0.13.0 given my test.
    
    This could be fixed to 0.11.0 if we remove `copy` option used in `astype`. 
This looks actually not recommended
    
    
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html
    
    > Return a copy when copy = True (be really careful with this!) 
    
    I guess this code path looks not quite hot. It looks probably slightly 
better to not use it for now. 
    
    In Pandas 0.10.0, it starts to work incorrectly. So, this PR proposes to 
remove `copy` and set the version to 0.11.0.
    
    **With Pandas 0.13.0** - released, 2014-01
    
    ```
    a      int32
    b     object
    c       bool
    d    float32
    dtype: object
    ```
    
    **With Pandas 0.12.0** -  - released, 2013-06
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/dataframe.py", line 1734, in toPandas
        pdf[f] = pdf[f].astype(t, copy=False)
    TypeError: astype() got an unexpected keyword argument 'copy'
    ```
    
    without `copy`
    
    ```
    a      int32
    b     object
    c       bool
    d    float32
    dtype: object
    ```
    
    **With Pandas 0.11.0** - released, 2013-03
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/dataframe.py", line 1734, in toPandas
        pdf[f] = pdf[f].astype(t, copy=False)
    TypeError: astype() got an unexpected keyword argument 'copy'
    ```
    
    without `copy`
    
    ```
    a      int32
    b     object
    c       bool
    d    float32
    dtype: object
    ```
    
    **With Pandas 0.10.0** -  released, 2012-12
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/sql/dataframe.py", line 1734, in toPandas
        pdf[f] = pdf[f].astype(t, copy=False)
    TypeError: astype() got an unexpected keyword argument 'copy'
    ```
    
    without `copy`
    
    ```
    a      int64  # <- this should be 'int32'
    b     object
    c       bool
    d    float64  # <- this should be 'float32'
    ```
    
    
    ## How was this patch tested?
    
    Manually tested with Pandas from 0.10.0 to 0.13.0.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-21193

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18403.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18403
    
----
commit 4ddc54b19b3ab036b49cbc9ce955c34bb6625c3a
Author: hyukjinkwon <[email protected]>
Date:   2017-06-23T10:43:42Z

    Specify Pandas version in setup.py

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18403: [SPARK-21193][PYTHON] Specify Pandas version in s...

Reply via email to