jingz-db commented on PR #51036: URL: https://github.com/apache/spark/pull/51036#issuecomment-2917387367
> a high-level question: You mentioned this optimization only helps when there are "not-to-be-huge number of timers". Is that because the change reduces the number of calls from N to N - 1, so for a small N (e.g., from 2 to 1), the reduction is significant (50%)—but when N is large, the relative benefit becomes minimal? IIUC, another benefits of this PR is to get rid of the mid layer of PandasDataFrame and directly use `Row`. This also saves se/deserialization time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org