bogao007 commented on code in PR #48290:
URL: https://github.com/apache/spark/pull/48290#discussion_r1792291361
##########
python/pyspark/sql/streaming/list_state_client.py:
##########
@@ -78,8 +78,11 @@ def get(self, state_name: str, iterator_id: str) -> Tuple:
status = response_message[0]
if status == 0:
iterator =
self._stateful_processor_api_client._read_arrow_state()
- batch = next(iterator)
- pandas_df = batch.to_pandas()
+ data_batch = None
Review Comment:
The previous code would stuck forever after we added the arrow resource
cleanup logic (I think it might be related to previous logic did not exhaust
the iterator, though that iterator did only contain a single batch), hence
using the recommended way to consume the arrow batches which is
```
for batch in iterator:
......
```
The logic is the same as the previous one, we only need to consume a single
batch here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]