HeartSaVioR commented on code in PR #47933:
URL: https://github.com/apache/spark/pull/47933#discussion_r1770702214
##########
python/pyspark/sql/streaming/stateful_processor.py:
##########
@@ -99,25 +99,25 @@ def exists(self) -> bool:
"""
return self._list_state_client.exists(self._state_name)
- def get(self) -> Iterator[Row]:
+ def get(self) -> Iterator[Tuple]:
Review Comment:
Wait, I see we provide "Row" in ValueState.get() and we build Row instance
to match with the signature. Are we going to diverge UX, or the type hint is
incorrect and list state also builds Row instance per element?
We don't need to strictly match with ApplyInPandasWithState but we need to
be consistent among state types in TransformWithStateInPandas.
##########
python/pyspark/sql/streaming/stateful_processor.py:
##########
@@ -99,25 +99,25 @@ def exists(self) -> bool:
"""
return self._list_state_client.exists(self._state_name)
- def get(self) -> Iterator[Row]:
+ def get(self) -> Iterator[Tuple]:
Review Comment:
I'm OK either way, 1) be consistent with Tuple for state read & write 2)
allow Tuple for state write but provide Row for state read to be strict with
schema.
I don't even think it's a crazy idea to enforce Row for state write to be
consistent. Though I kind of agree that Tuple is probably easier for users to
deal with.
cc. @HyukjinKwon
Do we have preference in PySpark for such a scenario? We internally use Row,
and need to decide whether we expose Row as it is, or whatever convenient type
to users and handle conversion internally.
We previously used Tuple for applyInPandasWithState so probably the
preference is the latter, but wanted to double confirm before moving on.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]