Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

via GitHub Wed, 06 Nov 2024 12:21:11 -0800


harshmotw-db commented on code in PR #48770:
URL: https://github.com/apache/spark/pull/48770#discussion_r1831660768



##########
python/pyspark/sql/pandas/types.py:
##########
@@ -1295,6 +1306,16 @@ def convert_udt(value: Any) -> Any:
 
             return convert_udt
 
+        elif isinstance(dt, VariantType):
+            def convert_variant(variant: Any) -> Any:

Review Comment:
   IIUC, This is the inverse of [this 
function](https://github.com/apache/spark/blob/876d5cab7ffeb55b86516350af23aca8cae35afe/python/pyspark/sql/pandas/types.py#L1043)
 which converts arrow data to pandas data which can be processed by UDFs. This 
function converts VariantVal back to arrow so it can be sent downstream via 
arrow.
   
   Essentially, when variants are received via arrow, the arrow type is struct, 
so the python type used is dict. This dict is converted to VariantVal using the 
linked function. The returned VariantVals in UDFs need to be sent downstream 
using arrow. This function performs this conversion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

Reply via email to