[GitHub] spark pull request #19319: [SPARK-21766][PySpark][SQL] DataFrame toPandas() ...

viirya Fri, 22 Sep 2017 06:07:13 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19319#discussion_r140488022
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1891,14 +1892,19 @@ def toPandas(self):
                           "if using spark.sql.execution.arrow.enable=true"
                     raise ImportError("%s\n%s" % (e.message, msg))
             else:
    +            pdf = pd.DataFrame.from_records(self.collect(), 
columns=self.columns)
    +
                 dtype = {}
                 for field in self.schema:
                     pandas_type = _to_corrected_pandas_type(field.dataType)
    -                if pandas_type is not None:
    +                # SPARK-21766: if an integer field is nullable and has 
null values, it can be
    +                # inferred by pandas as float column. Once we convert the 
column with NaN back
    +                # to integer type e.g., np.int16, we will hit exception.
    --- End diff --
    
    Added. Thanks.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19319: [SPARK-21766][PySpark][SQL] DataFrame toPandas() ...

Reply via email to