[GitHub] [spark] MrBago opened a new pull request #26442: [SPARK-28978 ] Support > 256 args to python udf

GitBox Fri, 08 Nov 2019 16:05:35 -0800

MrBago opened a new pull request #26442: [SPARK-28978 ] Support > 256 args to 
python udf
URL: https://github.com/apache/spark/pull/26442
 
 
   ### What changes were proposed in this pull request?
   
   On the worker we express lambda functions as strings and then eval them to 
create a "mapper" function. This make the code hard to read & limits the # of 
arguments a udf can support to 256 for python <= 3.6.
   
   This PR rewrites the mapper functions as nested functions instead of "lambda 
strings" and allows passing in more than 255 args.
   
   
   ### Why are the changes needed?
   The jira ticket associated with this issue describes how MLflow uses udfs to 
consume columns as features. This pattern isn't unique and a limit of 255 
features is quite low.
   
   ### Does this PR introduce any user-facing change?
   Users can now pass more than 255 cols to a udf function.
   
   
   ### How was this patch tested?
   Added a unit test for passing in > 255 args to udf.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MrBago opened a new pull request #26442: [SPARK-28978 ] Support > 256 args to python udf

Reply via email to