HyukjinKwon commented on a change in pull request #26747: [SPARK-29188][PySpark] toPandas gets wrong dtypes when applied on empty DF URL: https://github.com/apache/spark/pull/26747#discussion_r353500751
########## File path: python/pyspark/sql/dataframe.py ########## @@ -2305,16 +2305,15 @@ def _to_corrected_pandas_type(dt): uncorrectly. """ import numpy as np - if type(dt) == ByteType: - return np.int8 - elif type(dt) == ShortType: - return np.int16 - elif type(dt) == IntegerType: - return np.int32 - elif type(dt) == FloatType: - return np.float32 - else: - return None + mappings = { Review comment: We should list up all the data types here. Initially it was in order to correct pandas's inferred type. Now, in case of empty data, pandas always infers it as `object` and you should rely on this type mapping unlike the intended case before. See `to_arrow_type` as an example for complete type mapping. You might need to check what Spark -> Python -> pandas type conversion combinations and whitelist it here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org