Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r142243923
--- Diff: python/pyspark/sql/types.py ---
@@ -1624,6 +1624,34 @@ def toArrowType(dt):
return arrow_type
+def from_pandas_type(dt):
+ """ Convert pandas data type to Spark data type
+ """
+ import pandas as pd
+ import numpy as np
+ if dt == np.int32:
+ return IntegerType()
+ elif dt == np.int64:
+ return LongType()
+ elif dt == np.float32:
+ return FloatType()
+ elif dt == np.float64:
+ return DoubleType()
+ elif dt == np.object:
+ return StringType()
--- End diff --
Aren't there other types that are plain `object`s besides strings? I think
it would be better to use Arrow to map Pandas dtype to Arrow type, then have
`def from_arrow_type(t)` to map Arrow to Spark. This will be easier to support
and we have similar type conversion in Scala.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]