Re: [PR] [SPARK-54392][SS] Optimize JVM-Python communication for TWS initial state [spark]

via GitHub Fri, 05 Dec 2025 02:48:27 -0800


HeartSaVioR commented on code in PR #53122:
URL: https://github.com/apache/spark/pull/53122#discussion_r2592232421



##########
python/pyspark/sql/pandas/serializers.py:
##########
@@ -2236,15 +2237,15 @@ def row_iterator():
             for batch in batches:
                 # Detect which column has data - each batch contains only one 
type
                 input_result = extract_rows(batch, "inputData", 
self.key_offsets)
+                init_result = extract_rows(batch, "initState", 
self.init_key_offsets)
 
                 if input_result is not None:

Review Comment:
   nit: same here, XOR?



##########
python/pyspark/sql/pandas/serializers.py:
##########
@@ -2009,20 +2009,21 @@ def row_stream():
                         for i, c in 
enumerate(flatten_state_table.itercolumns())
                     ]
 
+                    flatten_init_table = flatten_columns(batch, "initState")
+                    init_data_pandas = [
+                        self.arrow_to_pandas(c, i)
+                        for i, c in enumerate(flatten_init_table.itercolumns())
+                    ]
+
                     if bool(data_pandas):

Review Comment:
   nit: Probably just assert with XOR here to confirm either bool(data_pandas) 
or bool(init_data_pandas) is True and another is False?
   
   ```
   >>> True^True
   False
   >>> True^False
   True
   >>> False^True
   True
   >>> False^False
   False
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-54392][SS] Optimize JVM-Python communication for TWS initial state [spark]

Reply via email to