Re: [PR] [SPARK-54962][PYTHON] Fix nullable integers handling in Pandas UDF [spark]

via GitHub Thu, 08 Jan 2026 03:46:03 -0800


zhengruifeng commented on code in PR #53730:
URL: https://github.com/apache/spark/pull/53730#discussion_r2672035587



##########
python/pyspark/sql/pandas/types.py:
##########
@@ -842,6 +842,32 @@ def _to_corrected_pandas_type(dt: DataType) -> 
Optional[Any]:
         return None
 
 
+def _to_corrected_pandas_ext_type(dt: DataType) -> Optional[Any]:
+    """
+    Convert spark datatype to a Pandas extension type which support nullable 
data.
+    """
+    import pandas as pd
+
+    if type(dt) == ByteType:
+        return pd.Int8Dtype()
+    elif type(dt) == ShortType:
+        return pd.Int16Dtype()
+    elif type(dt) == IntegerType:
+        return pd.Int32Dtype()
+    elif type(dt) == LongType:
+        return pd.Int64Dtype()
+    elif type(dt) == FloatType:
+        return pd.Float32Dtype()
+    elif type(dt) == DoubleType:
+        return pd.Float64Dtype()
+    elif type(dt) == BooleanType:
+        return pd.BooleanDtype()
+    elif type(dt) == StringType:
+        return pd.StringDtype()
+    else:
+        return None
+
+
 @functools.lru_cache(maxsize=64)
 def _create_converter_to_pandas(

Review Comment:
   this function is used in multiple place, so I add a argument with default 
false to make it only take effect in pandas udf.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54962][PYTHON] Fix nullable integers handling in Pandas UDF [spark]

Reply via email to