Re: [PR] [SPARK-49821][SS][PYTHON] Implement MapState and TTL support for TransformWithStateInPandas [spark]

via GitHub Tue, 08 Oct 2024 11:10:55 -0700


bogao007 commented on code in PR #48290:
URL: https://github.com/apache/spark/pull/48290#discussion_r1792291361



##########
python/pyspark/sql/streaming/list_state_client.py:
##########
@@ -78,8 +78,11 @@ def get(self, state_name: str, iterator_id: str) -> Tuple:
             status = response_message[0]
             if status == 0:
                 iterator = 
self._stateful_processor_api_client._read_arrow_state()
-                batch = next(iterator)
-                pandas_df = batch.to_pandas()
+                data_batch = None

Review Comment:
   The previous code would stuck forever after we added the arrow resource 
cleanup logic (I think it might be related to previous logic did not exhaust 
the iterator, though that iterator did only contain a single batch), hence 
using the recommended way to consume the arrow batches which is
   ```
   for batch in iterator:
       ......
   ```
   The logic is the same as the previous one, we only need to consume a single 
batch here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49821][SS][PYTHON] Implement MapState and TTL support for TransformWithStateInPandas [spark]

Reply via email to