rxin commented on issue #27165: [SPARK-28264][PYTHON][SQL] Support type hints in pandas UDF and rename/move inconsistent pandas UDF types URL: https://github.com/apache/spark/pull/27165#issuecomment-573965654 On the first one they can wrap an existing function can’t they? They don’t need to modify it. On Mon, Jan 13, 2020 at 5:43 PM Bryan Cutler <[email protected]> wrote: > This looks pretty good to me @HyukjinKwon <https://github.com/HyukjinKwon> > , but I have a couple concerns > > Using type hints can be a clear way to define the inputs/output of a UDF, > but it might be problematic to be the only way a user can make a PandasUDF. > What about if the user has a function they cannot modify to include type > hints, or using lambda/partial functions? For these cases, it makes sense > to use a wrapper like before, e.g. pandas_udf(func). I'd also worry that > forcing type hints might be off-putting to some users, since they are not > that widely used or optional. I think it would be good to support a > long-form PandasUDF declaration (like proposal 1) and also allow to infer > the long-form from type hints (like this PR), so users have both options. > > I'm also not too sure about changing some of the PandasUDFs to use regular > functions, like with df.groupby.apply(udf). I think it makes things less > consistent by not using a pandas_udf for everything, and it could be > inconvenient for the user to keep specifying the schema as an argument for > multiple calls, instead of just binding it once with pandas_udf. It > should be possible to still remove the udf type from the API so the > following could be done which is almost similar to the proposed change: > > df.groupby.apply(pandas_udf(f, schema=...)) > > Anyway, I'm not opposed to anything here and fine with the new proposal, > just some slight concerns before we commit to any new changes > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/27165?email_source=notifications&email_token=AACO6PFD27TPFXKAWOKNH6DQ5UKCLA5CNFSM4KFD2UI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI26TJY#issuecomment-573958567>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AACO6PEOCTB66QRQVA3S2HTQ5UKCLANCNFSM4KFD2UIQ> > . >
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
