neilramaswamy opened a new pull request, #47641:
URL: https://github.com/apache/spark/pull/47641

   ### What changes were proposed in this pull request?
   
   These changes ensure that implicit grouping key thread locals are set in two 
places:
   
   1. When `handleInputRows` is called. This allows for the user to get/set 
keyed state in the body of `handleInputRows` before they create the iterator 
that they return (see the UT).
   2. When methods on the returned iterator from `handleInputRows` are called.
   
   
   ### Why are the changes needed?
   
   Previously, if `handleInputRows` returned a lazy iterator, then the 
following would happen:
   
   1. The implicit grouping key was set in `processNewData`
   2. `handleInputRows` ran, and returned an iterator, call it `iter`
   3. The implicit grouping key was unset
   4. When the sink finally  causes the iterator to evaluate, the iterator from 
`handleInputRows` is invoked, but cannot find the implicit grouping key
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New UT. All existing UTs should pass.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to