HeartSaVioR commented on code in PR #47933:
URL: https://github.com/apache/spark/pull/47933#discussion_r1770702214


##########
python/pyspark/sql/streaming/stateful_processor.py:
##########
@@ -99,25 +99,25 @@ def exists(self) -> bool:
         """
         return self._list_state_client.exists(self._state_name)
 
-    def get(self) -> Iterator[Row]:
+    def get(self) -> Iterator[Tuple]:

Review Comment:
   Wait, I see we provide "Row" in ValueState.get() and we build Row instance 
to match with the signature. Are we going to diverge UX, or the type hint is 
incorrect and list state also builds Row instance per element?
   
   We don't need to strictly match with ApplyInPandasWithState but we need to 
be consistent among state types in TransformWithStateInPandas.



##########
python/pyspark/sql/streaming/stateful_processor.py:
##########
@@ -99,25 +99,25 @@ def exists(self) -> bool:
         """
         return self._list_state_client.exists(self._state_name)
 
-    def get(self) -> Iterator[Row]:
+    def get(self) -> Iterator[Tuple]:

Review Comment:
   I'm OK either way, 1) be consistent with Tuple for state read & write 2) 
allow Tuple for state write but provide Row for state read to be strict with 
schema.
   
   I don't even think it's a crazy idea to enforce Row for state write to be 
consistent. Though I kind of agree that Tuple is probably easier for users to 
deal with.
   
   cc. @HyukjinKwon 
   Do we have preference in PySpark for such a scenario? We internally use Row, 
and need to decide whether we expose Row as it is, or whatever convenient type 
to users and handle conversion internally.
   We previously used Tuple for applyInPandasWithState so probably the 
preference is the latter, but wanted to double confirm before moving on.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to