Re: [PR] [SPARK-52333][SS][PYTHON] Squeeze the protocol of retrieving timers for transformWithState in PySpark [spark]

via GitHub Wed, 28 May 2025 12:26:42 -0700


jingz-db commented on PR #51036:
URL: https://github.com/apache/spark/pull/51036#issuecomment-2917387367


   > a high-level question: You mentioned this optimization only helps when 
there are "not-to-be-huge number of timers". Is that because the change reduces 
the number of calls from N to N - 1, so for a small N (e.g., from 2 to 1), the 
reduction is significant (50%)—but when N is large, the relative benefit 
becomes minimal?
   
   IIUC, another benefits of this PR is to get rid of the mid layer of 
PandasDataFrame and directly use `Row`. This also saves se/deserialization time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-52333][SS][PYTHON] Squeeze the protocol of retrieving timers for transformWithState in PySpark [spark]

Reply via email to