Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/18378#discussion_r181567408
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1750,6 +1761,24 @@ def _to_scala_map(sc, jm):
return sc._jvm.PythonUtils.toScalaMap(jm)
+def _to_corrected_pandas_type(dt):
+ """
+ When converting Spark SQL records to Pandas DataFrame, the inferred
data type may be wrong.
+ This method gets the corrected data type for Pandas if that type may
be inferred uncorrectly.
+ """
+ import numpy as np
+ if type(dt) == ByteType:
+ return np.int8
+ elif type(dt) == ShortType:
+ return np.int16
+ elif type(dt) == IntegerType:
+ return np.int32
+ elif type(dt) == FloatType:
+ return np.float32
+ else:
--- End diff --
Yup, it was unfortunate but it was a bug that we should fix. Does that
cause an actual break or simply just unit test failure?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]