[GitHub] [spark] HyukjinKwon commented on a change in pull request #26747: [SPARK-29188][PySpark] toPandas gets wrong dtypes when applied on empty DF

GitBox Tue, 03 Dec 2019 16:53:59 -0800

HyukjinKwon commented on a change in pull request #26747: 
[SPARK-29188][PySpark] toPandas gets wrong dtypes when applied on empty DF
URL: https://github.com/apache/spark/pull/26747#discussion_r353500751


 ##########
 File path: python/pyspark/sql/dataframe.py
 ##########
 @@ -2305,16 +2305,15 @@ def _to_corrected_pandas_type(dt):
     uncorrectly.
     """
     import numpy as np
-    if type(dt) == ByteType:
-        return np.int8
-    elif type(dt) == ShortType:
-        return np.int16
-    elif type(dt) == IntegerType:
-        return np.int32
-    elif type(dt) == FloatType:
-        return np.float32
-    else:
-        return None
+    mappings = {
 
 Review comment:
   We should list up all the data types here. Initially it was in order to 
correct pandas's inferred type.
   Now, in case of empty data, pandas always infers it as `object` and you 
should rely on this type mapping unlike the intended case before.
   
   See `to_arrow_type` as an example for complete type mapping. You might need 
to check what Spark -> Python -> pandas type conversion combinations and 
whitelist it here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26747: [SPARK-29188][PySpark] toPandas gets wrong dtypes when applied on empty DF

Reply via email to