HyukjinKwon opened a new pull request, #41563: URL: https://github.com/apache/spark/pull/41563
### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/41316 that sets the correct batch size as 1 when it streams from Python to JVM. In addition, this PR contains slight cleanup. ### Why are the changes needed? Currently we're sending 100 groups instead of 100 rows (per batch), meaning that we will send 100 iterators from 100 UDTF invocations with different arguments. This is because one row in `BatchedSerializer` is actually one group from one UDTF invocation within a tuple. This PR sets the batch size as 1 for now. Ideally we should implement its custom logic to batch them per row-size. However, this requires to define an additional protocol (e.g., to signal which is the end of one UDTF invocation). Therefore, this PR fixes it with a minimized change for now. ### Does this PR introduce _any_ user-facing change? No, UDTF is not released to the end users yet. ### How was this patch tested? Manually tested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
