Re: [PR] [SPARK-54962][PYTHON] Fix nullable integers handling in Pandas UDF [spark]

via GitHub Thu, 08 Jan 2026 15:52:22 -0800


gaogaotiantian commented on code in PR #53730:
URL: https://github.com/apache/spark/pull/53730#discussion_r2674282915



##########
python/pyspark/sql/pandas/types.py:
##########
@@ -842,6 +842,32 @@ def _to_corrected_pandas_type(dt: DataType) -> 
Optional[Any]:
         return None
 
 
+def _to_corrected_pandas_ext_type(dt: DataType) -> Optional[Any]:
+    """
+    Convert spark datatype to a Pandas extension type which support nullable 
data.
+    """
+    import pandas as pd
+
+    if type(dt) == ByteType:

Review Comment:
   I don't believe the performance matters a lot here, but we should just get 
type of `dt` once, instead of repeatedly calling it.
   
   Also, could we use `is` for type comparison? `isinstance` is an alternative 
if we do not plan to do exact check. `==` comparison between types is 
discouraged by linter and we ignored that for now (I will change it in the 
future).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54962][PYTHON] Fix nullable integers handling in Pandas UDF [spark]

Reply via email to