[PR] [SPARK-54703][PYTHON] Consolidate SQL_GROUPED_AGG_ARROW_ITER_UDF and SQL_GROUPED_AGG_PANDAS_ITER_UDF mapper logic [spark]

via GitHub Tue, 16 Dec 2025 16:17:25 -0800


Yicong-Huang opened a new pull request, #53492:
URL: https://github.com/apache/spark/pull/53492


   ### What changes were proposed in this pull request?
   
   This PR consolidates the identical mapper logic for 
`SQL_GROUPED_AGG_ARROW_ITER_UDF` and `SQL_GROUPED_AGG_PANDAS_ITER_UDF` in 
`worker.py`.
   
   Both UDF types share the exact same iteration pattern for extracting columns 
from batches - they only differ in the data types being processed (Arrow arrays 
vs Pandas Series). The logic has been merged into a single conditional branch.
   
   ### Why are the changes needed?
   
   The two mapper implementations were completely identical, differing only in 
variable names and comments. This duplication was unnecessary and made the code 
harder to maintain.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing unit tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-54703][PYTHON] Consolidate SQL_GROUPED_AGG_ARROW_ITER_UDF and SQL_GROUPED_AGG_PANDAS_ITER_UDF mapper logic [spark]

Reply via email to