BryanCutler commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#issuecomment-573958567 This looks pretty good to me @HyukjinKwon , but I have a couple concerns Using type hints can be a clear way to define the inputs/output of a UDF, but it might be problematic to be the only way a user can make a PandasUDF. What about if the user has a function they cannot modify to include type hints, or using lambda/partial functions? For these cases, it makes sense to use a wrapper like before, e.g. `pandas_udf(func)`. I'd also worry that forcing type hints might be off-putting to some users, since they are not that widely used or optional. I think it would be good to support a long-form PandasUDF declaration (like proposal 1) and also allow to infer the long-form from type hints (like this PR), so users have both options. I'm also not too sure about changing some of the PandasUDFs to use regular functions, like with `df.groupby.apply(udf)`. I think it makes things less consistent by not using a `pandas_udf` for everything, and it could be inconvenient for the user to keep specifying the schema as an argument for multiple calls, instead of just binding it once with `pandas_udf`. It should be possible to still remove the udf type from the API so the following could be done which is almost similar to the proposed change: ```python df.groupby.apply(pandas_udf(f, schema=...)) ``` Anyway, I'm not opposed to anything here and fine with the new proposal, just some slight concerns before we commit to any new changes
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
