Re: CREATE GLOBAL FUNCTION works on all databases
Hi! I also like the idea of fallback database for functions, it seems like a fairly simple but very useful feature. One thing I would consider is adding this as a query option instead of a flag, but it is probably harder to implement, so I am ok with adding a flag now, and possibly later adding a query option that overrides it. Regards, Csaba On Wed, Nov 9, 2022 at 11:51 AM Johan du Plessis wrote: > Hi, > > I think this is a good idea. It might allow the possibility to > separate functions from the core of Impala and have "function packs" with > their own release schedule and not depend on an upgrade to add those > functions. E.g. imagine a "geometry function pack" that implements ST_ > functions. It will lower the barrier of entry and speed of development of > additional functionality and will speed up adoption because there might not > be any need to upgrade impala to get new functions. > > Regards, > Johan du Plessis > > > > On Tue, 8 Nov 2022 at 08:32, Quanlong Huang > wrote: > > > Hi Xiaoqing, > > > > Thanks for raising this request! This requires creating a > "_impala_global" > > database in Hive when installing Impala, since each function is > associated > > with a db in HMS. Also need planner changes in resolving function names. > > > > Why not just create these "global" UDFs in a util db and use their fully > > qualified names (.)? Queries won't be lengthy if a short > db > > name is used. > > > > Regards, > > Quanlong > > > > On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao wrote: > > > > > Hi team! > > > When I execute the CREATE FUNCTION statement, It can only work on one > > > database that I specified. > > > I hope to support a feature when I execute the following statement, it > > can > > > work on all databases. The Syntax: > > > CREATE GLOBAL FUNCTION [IF NOT EXISTS] > > [db_name.]function_name([arg_type[, > > > arg_type...]) > > > RETURNS return_type > > > LOCATION 'hdfs_path_to_dot_so' > > > SYMBOL='symbol_name' > > > > > > It'll need a default database named _impala_global. The global function > > > will be related to _impala_global. > > > > > > Do you have any ideas? > > > > > > Best Regards, > > > Xiaoqing Gao > > > > > >
Re: CREATE GLOBAL FUNCTION works on all databases
Hi, I think this is a good idea. It might allow the possibility to separate functions from the core of Impala and have "function packs" with their own release schedule and not depend on an upgrade to add those functions. E.g. imagine a "geometry function pack" that implements ST_ functions. It will lower the barrier of entry and speed of development of additional functionality and will speed up adoption because there might not be any need to upgrade impala to get new functions. Regards, Johan du Plessis On Tue, 8 Nov 2022 at 08:32, Quanlong Huang wrote: > Hi Xiaoqing, > > Thanks for raising this request! This requires creating a "_impala_global" > database in Hive when installing Impala, since each function is associated > with a db in HMS. Also need planner changes in resolving function names. > > Why not just create these "global" UDFs in a util db and use their fully > qualified names (.)? Queries won't be lengthy if a short db > name is used. > > Regards, > Quanlong > > On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao wrote: > > > Hi team! > > When I execute the CREATE FUNCTION statement, It can only work on one > > database that I specified. > > I hope to support a feature when I execute the following statement, it > can > > work on all databases. The Syntax: > > CREATE GLOBAL FUNCTION [IF NOT EXISTS] > [db_name.]function_name([arg_type[, > > arg_type...]) > > RETURNS return_type > > LOCATION 'hdfs_path_to_dot_so' > > SYMBOL='symbol_name' > > > > It'll need a default database named _impala_global. The global function > > will be related to _impala_global. > > > > Do you have any ideas? > > > > Best Regards, > > Xiaoqing Gao > > >
Re: CREATE GLOBAL FUNCTION works on all databases
Hi Quanlong, Yes, it can be understood this way. Upper layers implement these udf in libimpala.so, because these udfs are business-aligned functions. Implementing built-in functions in impala is not appropriate. These functions are maintained by the upper layer. When executed "use _impala_builtins; create function udf() returns string location '/libimpala.so' symbol='xxx'" It will throw an exception "Cannot modify system database". I'll add a fallback db for resolving functions and add a jira in the hive. Thanks for your help. Regards, Xiaoqing Quanlong Huang 于2022年11月9日周三 10:29写道: > Hi Xiaoqing, > > Just curious, are they migrating from other systems to Impala? and those > missing functions are built-in functions in that system? We can add those > missing built-in functions in Impala as well. > > Regarding the code change, I think it's harmless to add a fallback db for > resolving functions. This solution is more lightweight than introducing a > global function type which might need design for new privileges. > > BTW, it'd be nice if Hive can add this feature too. So we don't introduce a > new feature gap between Impala and Hive. Feel free to file JIRAs if there > are no objections in this thread. > > Thanks, > Quanlong > > On Tue, Nov 8, 2022 at 3:45 PM xiaoqing gao wrote: > > > Hi Quanlong, > > > > Thanks for your advice. I think it's a good way. > > But there were hundreds of queries at least persistenced in scripts. It's > > unfriendly to let customers change queries. So we have no choice but to > be > > compatible. > > > > If I add a global flag, --global_function_database_name="util_db". > > In > > > > > https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionName.java#L126 > > First find the function function name in _impala_builtins, then find > > function name in global_function_database_name, at last find in analyzer. > > getDefaultDb(). > > > > I test it works. What do you think? > > > > Regards, > > Xiaoqing > > > > > > > > Quanlong Huang 于2022年11月8日周二 14:32写道: > > > > > Hi Xiaoqing, > > > > > > Thanks for raising this request! This requires creating a > > "_impala_global" > > > database in Hive when installing Impala, since each function is > > associated > > > with a db in HMS. Also need planner changes in resolving function > names. > > > > > > Why not just create these "global" UDFs in a util db and use their > fully > > > qualified names (.)? Queries won't be lengthy if a > short > > db > > > name is used. > > > > > > Regards, > > > Quanlong > > > > > > On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao > wrote: > > > > > > > Hi team! > > > > When I execute the CREATE FUNCTION statement, It can only work on one > > > > database that I specified. > > > > I hope to support a feature when I execute the following statement, > it > > > can > > > > work on all databases. The Syntax: > > > > CREATE GLOBAL FUNCTION [IF NOT EXISTS] > > > [db_name.]function_name([arg_type[, > > > > arg_type...]) > > > > RETURNS return_type > > > > LOCATION 'hdfs_path_to_dot_so' > > > > SYMBOL='symbol_name' > > > > > > > > It'll need a default database named _impala_global. The global > function > > > > will be related to _impala_global. > > > > > > > > Do you have any ideas? > > > > > > > > Best Regards, > > > > Xiaoqing Gao > > > > > > > > > >
Re: CREATE GLOBAL FUNCTION works on all databases
Hi Xiaoqing, Just curious, are they migrating from other systems to Impala? and those missing functions are built-in functions in that system? We can add those missing built-in functions in Impala as well. Regarding the code change, I think it's harmless to add a fallback db for resolving functions. This solution is more lightweight than introducing a global function type which might need design for new privileges. BTW, it'd be nice if Hive can add this feature too. So we don't introduce a new feature gap between Impala and Hive. Feel free to file JIRAs if there are no objections in this thread. Thanks, Quanlong On Tue, Nov 8, 2022 at 3:45 PM xiaoqing gao wrote: > Hi Quanlong, > > Thanks for your advice. I think it's a good way. > But there were hundreds of queries at least persistenced in scripts. It's > unfriendly to let customers change queries. So we have no choice but to be > compatible. > > If I add a global flag, --global_function_database_name="util_db". > In > > https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionName.java#L126 > First find the function function name in _impala_builtins, then find > function name in global_function_database_name, at last find in analyzer. > getDefaultDb(). > > I test it works. What do you think? > > Regards, > Xiaoqing > > > > Quanlong Huang 于2022年11月8日周二 14:32写道: > > > Hi Xiaoqing, > > > > Thanks for raising this request! This requires creating a > "_impala_global" > > database in Hive when installing Impala, since each function is > associated > > with a db in HMS. Also need planner changes in resolving function names. > > > > Why not just create these "global" UDFs in a util db and use their fully > > qualified names (.)? Queries won't be lengthy if a short > db > > name is used. > > > > Regards, > > Quanlong > > > > On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao wrote: > > > > > Hi team! > > > When I execute the CREATE FUNCTION statement, It can only work on one > > > database that I specified. > > > I hope to support a feature when I execute the following statement, it > > can > > > work on all databases. The Syntax: > > > CREATE GLOBAL FUNCTION [IF NOT EXISTS] > > [db_name.]function_name([arg_type[, > > > arg_type...]) > > > RETURNS return_type > > > LOCATION 'hdfs_path_to_dot_so' > > > SYMBOL='symbol_name' > > > > > > It'll need a default database named _impala_global. The global function > > > will be related to _impala_global. > > > > > > Do you have any ideas? > > > > > > Best Regards, > > > Xiaoqing Gao > > > > > >
Re: CREATE GLOBAL FUNCTION works on all databases
Hi Quanlong, Thanks for your advice. I think it's a good way. But there were hundreds of queries at least persistenced in scripts. It's unfriendly to let customers change queries. So we have no choice but to be compatible. If I add a global flag, --global_function_database_name="util_db". In https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/FunctionName.java#L126 First find the function function name in _impala_builtins, then find function name in global_function_database_name, at last find in analyzer. getDefaultDb(). I test it works. What do you think? Regards, Xiaoqing Quanlong Huang 于2022年11月8日周二 14:32写道: > Hi Xiaoqing, > > Thanks for raising this request! This requires creating a "_impala_global" > database in Hive when installing Impala, since each function is associated > with a db in HMS. Also need planner changes in resolving function names. > > Why not just create these "global" UDFs in a util db and use their fully > qualified names (.)? Queries won't be lengthy if a short db > name is used. > > Regards, > Quanlong > > On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao wrote: > > > Hi team! > > When I execute the CREATE FUNCTION statement, It can only work on one > > database that I specified. > > I hope to support a feature when I execute the following statement, it > can > > work on all databases. The Syntax: > > CREATE GLOBAL FUNCTION [IF NOT EXISTS] > [db_name.]function_name([arg_type[, > > arg_type...]) > > RETURNS return_type > > LOCATION 'hdfs_path_to_dot_so' > > SYMBOL='symbol_name' > > > > It'll need a default database named _impala_global. The global function > > will be related to _impala_global. > > > > Do you have any ideas? > > > > Best Regards, > > Xiaoqing Gao > > >
Re: CREATE GLOBAL FUNCTION works on all databases
Hi Xiaoqing, Thanks for raising this request! This requires creating a "_impala_global" database in Hive when installing Impala, since each function is associated with a db in HMS. Also need planner changes in resolving function names. Why not just create these "global" UDFs in a util db and use their fully qualified names (.)? Queries won't be lengthy if a short db name is used. Regards, Quanlong On Mon, Nov 7, 2022 at 4:42 PM xiaoqing gao wrote: > Hi team! > When I execute the CREATE FUNCTION statement, It can only work on one > database that I specified. > I hope to support a feature when I execute the following statement, it can > work on all databases. The Syntax: > CREATE GLOBAL FUNCTION [IF NOT EXISTS] [db_name.]function_name([arg_type[, > arg_type...]) > RETURNS return_type > LOCATION 'hdfs_path_to_dot_so' > SYMBOL='symbol_name' > > It'll need a default database named _impala_global. The global function > will be related to _impala_global. > > Do you have any ideas? > > Best Regards, > Xiaoqing Gao >
