[GitHub] spark pull request #18945: Add option to convert nullable int columns to flo...

a10y Mon, 18 Sep 2017 08:09:07 -0700

Github user a10y commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18945#discussion_r139450187
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1810,17 +1810,20 @@ def _to_scala_map(sc, jm):
         return sc._jvm.PythonUtils.toScalaMap(jm)
     
     
    -def _to_corrected_pandas_type(dt):
    +def _to_corrected_pandas_type(field, strict=True):
         """
         When converting Spark SQL records to Pandas DataFrame, the inferred 
data type may be wrong.
         This method gets the corrected data type for Pandas if that type may 
be inferred uncorrectly.
         """
         import numpy as np
    +    dt = field.dataType
         if type(dt) == ByteType:
             return np.int8
         elif type(dt) == ShortType:
             return np.int16
         elif type(dt) == IntegerType:
    +        if not strict and field.nullable:
    +            return np.float32
    --- End diff --
    
    Is loss of precision a concern here? Some integers from the original 
dataset will now be rounded to the nearest representable float32 if I'm not 
mistaken.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18945: Add option to convert nullable int columns to flo...

Reply via email to