Re: [PR] [SPARK-55169][PYTHON] Use ArrowBatchTransformer.flatten_struct in ArrowStreamArrowUDTFSerializer [spark]

via GitHub Mon, 26 Jan 2026 19:00:10 -0800


Yicong-Huang commented on code in PR #53952:
URL: https://github.com/apache/spark/pull/53952#discussion_r2730043004



##########
python/pyspark/sql/conversion.py:
##########
@@ -63,17 +63,18 @@ class ArrowBatchTransformer:
     """
 
     @staticmethod
-    def flatten_struct(batch: "pa.RecordBatch") -> "pa.RecordBatch":
+    def flatten_struct(batch: "pa.RecordBatch", column_index: int = 0) -> 
"pa.RecordBatch":
         """
-        Flatten a single struct column into a RecordBatch.
+        Flatten a struct column at given index into a RecordBatch.
 
         Used by:
             - ArrowStreamUDFSerializer.load_stream
             - GroupArrowUDFSerializer.load_stream
+            - ArrowStreamArrowUDTFSerializer.load_stream
         """
         import pyarrow as pa
 
-        struct = batch.column(0)
+        struct = batch.column(column_index)
         return pa.RecordBatch.from_arrays(struct.flatten(), 
schema=pa.schema(struct.type))

Review Comment:
   I don’t think this is over-engineering. We currently have similar logic 
duplicated across multiple serializers, and this consolidates those patterns 
into one discoverable place. These are still simple, pure functions that can be 
tested independently and composed for readability, so the abstraction is mainly 
about organization and reuse rather than adding real complexity.
   
   If we later find that this layer isn’t pulling its weight, we can always 
inline the functions back into the serializers after the cleanup. For now, 
having a single place to centralize and deduplicate this logic makes the 
codebase easier to reason about. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55169][PYTHON] Use ArrowBatchTransformer.flatten_struct in ArrowStreamArrowUDTFSerializer [spark]

Reply via email to