neilramaswamy opened a new pull request, #47641: URL: https://github.com/apache/spark/pull/47641
### What changes were proposed in this pull request? These changes ensure that implicit grouping key thread locals are set in two places: 1. When `handleInputRows` is called. This allows for the user to get/set keyed state in the body of `handleInputRows` before they create the iterator that they return (see the UT). 2. When methods on the returned iterator from `handleInputRows` are called. ### Why are the changes needed? Previously, if `handleInputRows` returned a lazy iterator, then the following would happen: 1. The implicit grouping key was set in `processNewData` 2. `handleInputRows` ran, and returned an iterator, call it `iter` 3. The implicit grouping key was unset 4. When the sink finally causes the iterator to evaluate, the iterator from `handleInputRows` is invoked, but cannot find the implicit grouping key ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New UT. All existing UTs should pass. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
