Thanks all for your input!
I've updated FLIP-57 accordingly. To summarize the changes:
- introduced new concept of "Temporary system functions", which has no
namespace and override built-in functions
- repositioned "temporary functions" to be those with namespaces and
override
e concerns about accidentally
> > changing
> > > > the
> > > > > >>>>> semantics of built-in functions.
> > > > > >>>>> IMO, it can't get much more explicit than the above command.
> > > > > >>>>>
> > > > > >>>>> Sorry for bringing up a new option in the middle of the
> > > discussion,
> > > &
t; >>>> as
> > > > >>>>> I said, I think it has a bunch of benefits and I don't see
> major
> > > > >>>> drawbacks
> > > > >>>>> (maybe you do?).
> > > > >>>>>
> > > > >>>>> What do you think?
>
gt; Hi everyone,
> > > >>>>>>
> > > >>>>>> I thought again about option #1 and something that I don't like
> is
> > > >>> that
> > > >>>>>> the resolved address of xyz is different in "CR
gt;>>>>> the resolved address of xyz is different in "CREATE FUNCTION xyz"
> > >>> and
> > >>>>>> "CREATE TEMPORARY FUNCTION xyz".
> > >>>>>> IMO, adding the keyword "TEMPORARY" shoul
;> location
> >>>>>> might be confusing for users.
> >>>>>> After all, a temp function should behave pretty much like any other
> >>>>>> function, except for the fact that it disappears when the session is
> >>>>> closed.
> >>>>>&g
can use full names to access functions instead of shadowing.
So I think it is a completely new thing, and the direct way to deal
with
new things is to add new grammar. So,
+1 for #2, +0 for #3, -1 for #1
Best,
Jingsong Lee
--
From:K
t; > > > Approach #3 would be consistent with other db objects and the
>> "CREATE
>> > > > FUNCTION" statement.
>> > > > Adding system catalog/db seems rather complex, but then again how
>> often
>> > > do
>> > > > we expect users to override built-in functions? If this becomes a
>> major
>> >
gt; > > Not sure what's the best approach from an internal point of view,
> but I
> > > > certainly think that consistent behavior is important.
> > > > Hence my votes are:
> > > >
> > > > -1 for #1
> > > > 0 for #2
> > > > 0 for #3
> > > >
> > > > Btw. Did w
consider a completely separate command for overriding
> > built-in
> > > functions like "ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ..."?
> > >
> > > Cheers, Fabian
> > >
> > >
> > > Am Do., 19. Sept. 2019 um 11:03 Uhr schrieb JingsongLee
> > > :
> > >
19 um 11:03 Uhr schrieb JingsongLee
> > :
> >
> >> I know Hive and Spark can shadow built-in functions by temporary
> function.
> >> Mysql, Oracle, Sql server can not shadow.
> >> User can use full names to access functions instead of shadowing.
> >>
> >> So I think it i
> new things is to add new grammar. So,
>> +1 for #2, +0 for #3, -1 for #1
>>
>> Best,
>> Jingsong Lee
>>
>>
>> --
>> From:Kurt Young
>> Send Time:2019年9月19日(星期四) 16:43
>>
>
> So I think it is a completely new thing, and the direct way to deal with
> new things is to add new grammar. So,
> +1 for #2, +0 for #3, -1 for #1
>
> Best,
> Jingsong Lee
>
>
> ------
> From:Kurt
gt; things is to add new grammar. So,
> +1 for #2, +0 for #3, -1 for #1
>
> Best,
> Jingsong Lee
>
>
> --
> From:Kurt Young
> Send Time:2019年9月19日(星期四) 16:43
> To:dev
> Subject:Re: [DISCUSS] FLIP
for #2, +0 for #3, -1 for #1
Best,
Jingsong Lee
--
From:Kurt Young
Send Time:2019年9月19日(星期四) 16:43
To:dev
Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
And let me make my vote complete:
-1 for #1
+1 for #2
And let me make my vote complete:
-1 for #1
+1 for #2 with different keyword
-0 for #3
Best,
Kurt
On Thu, Sep 19, 2019 at 4:40 PM Kurt Young wrote:
> Looks like I'm the only person who is willing to +1 to #2 for now :-)
> But I would suggest to change the keyword from GLOBAL to
> something
Looks like I'm the only person who is willing to +1 to #2 for now :-)
But I would suggest to change the keyword from GLOBAL to
something like BUILTIN.
I think #2 and #3 are almost the same proposal, just with different
format to indicate whether it want to override built-in functions.
My biggest
Hi,
It is a quite long discussion to follow and I hope I didn’t misunderstand
anything. From the proposals presented by Xuefu I would vote:
-1 for #1 and #2
+1 for #3
Besides #3 being IMO more general and more consistent, having qualified names
(#3) would help/make easier for someone to use
I agree with Xuefu that inconsistent handling with all the other objects is
not a big problem.
Regarding to option#3, the special "system.system" namespace may confuse
users.
Users need to know the set of built-in function names to know when to use
"system.system" namespace.
What will happen if
@Dawid, Re: we also don't need additional referencing the specialcatalog
anywhere.
True. But once we allow such reference, then user can do so in any possible
place where a function name is expected, for which we have to handle.
That's a big difference, I think.
Thanks,
Xuefu
On Wed, Sep 18,
Re: The reason why I prefer option 3 is that in option 3 all objects
internally are identified with 3 parts.
True, but the problem we have is not about how to differentiate each type
objects internally. Rather, it's rather about how a user referencing an
object unambiguously and consistently.
@Bowen I am not suggesting introducing additional catalog. I think we need
to get rid of the current built-in catalog.
@Xuefu in option #3 we also don't need additional referencing the special
catalog anywhere else besides in the CREATE statement. The resolution
behaviour is exactly the same in
Hi Dawid,
"GLOBAL" is a temporary keyword that was given to the approach. It can be
changed to something else for better.
The difference between this and the #3 approach is that we only need the
keyword for this create DDL. For other places (such as function
referencing), no keyword or special
Hi,
For #2, as Xuefu and I discussed offline, the key point is to introduce a
keyword to SQL DDL to distinguish temp function that override built-in
functions v.s. temp functions that override catalog functions. It can be
something else than "GLOBAL", like "BUILTIN" (e.g. "CREATE BUILTIN TEMP
Last additional comment on Option 2. The reason why I prefer option 3 is
that in option 3 all objects internally are identified with 3 parts. This
makes it easier to handle at different locations e.g. while persisting
views, as all objects have uniform representation.
On Thu, 19 Sep 2019, 07:31
Hi,
I think it makes sense to start voting at this point.
Option 1: Only 1-part identifiers
PROS:
- allows shadowing built-in functions
CONS:
- incosistent with all the other objects, both permanent & temporary
- does not allow shadowing catalog functions
Option 2: Special keyword for built-in
Hi Aljoscha,
Thanks for the summary and these are great questions to be answered. The
answer to your first question is clear: there is a general agreement to
override built-in functions with temp functions.
However, your second and third questions are sort of related, as a function
reference can
Hi,
I think this discussion and the one for FLIP-64 are very connected. To resolve
the differences, think we have to think about the basic principles and find
consensus there. The basic questions I see are:
- Do we want to support overriding builtin functions?
- Do we want to support
Hi,
+1 to strive for reaching consensus on the remaining topics. We are close to
the truth. It will waste a lot of time if we resume the topic some time later.
+1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way to
override a catalog function.
I’m not sure about
Hi everyone,
@Xuefu: I would like to avoid adding too many things incrementally.
Users should be able to override all catalog objects consistently
according to FLIP-64 (Support for Temporary Objects in Table module). If
functions are treated completely different, we need more code and
hi, everyone
I think this flip is very meaningful. it supports functions that can be
shared by different catalogs and dbs, reducing the duplication of functions.
Our group based on flink's sql parser module implements create function
feature, stores the parsed function metadata and schema into
Thanks to Tmo and Dawid for sharing thoughts.
It seems to me that there is a general consensus on having temp functions
that have no namespaces and overwrite built-in functions. (As a side note
for comparability, the current user defined functions are all temporary and
having no namespaces.)
Hi,
Another idea to consider on top of Timo's suggestion. How about we have a
special namespace (catalog + database) for built-in objects? This catalog
would be invisible for users as Xuefu was suggesting.
Then users could still override built-in functions, if they fully qualify
object with the
Hi Bowen,
I understand the potential benefit of overriding certain built-in
functions. I'm open to such a feature if many people agree. However, it
would be great to still support overriding catalog functions with
temporary functions in order to prototype a query even though a
Hi Fabian,
Yes, I agree 1-part/no-override is the least favorable thus I didn't
include that as a voting option, and the discussion is mainly between
1-part/override builtin and 3-part/not override builtin.
Re > However, it means that temp functions are differently treated than
other db objects.
Hi all,
Thanks Dawid for the additional explanation!
As others summarized there are two questions:
1) Are temporal functions a) top-level functions (1-part address) and not
associated with a catalog/db or b) do we threat them like any other
database object with a 3-part address.
2) If we treat
Hi,
Thanks @Fabian @Dawid and everyone else for sharing your thoughts!
First, I'd like to take Hive built-in functions out of this FLIP to keep
our original scope and make it less controversial on a potential modular
approach. I will remove Hive built-in functions from the google doc.
Then the
Hi Fabian,
Thank you for your response.
Regarding the temporary function, just wanted to clarify one thing: the
3-part identifier does not mean the user always has to provide the catalog
& database explicitly. The same way user does not have to provide them in
e.g. when creating permanent table,
Hi all,
I'd like to add my opinion on this topic as well ;-)
In general, I think overriding built-in function with temp functions has a
couple of benefits but also a few challenges:
* Users can reimplement the behavior of a built-in functions of a different
system, e.g., for backward
Hi,
W.r.t temp functions, I feel both options have their benefits and can
theoretically achieve similar functionalities one way or another. In the
end, it's more about use cases, users habits, and trade-offs.
Re> Not always users are in full control of the catalog functions. There is
also the
I agree the consequences of the decision are substantial. Let's see what
others think.
-- Catalog functions are defined by users, and we suppose they can
drop/alter it in any way they want. Thus, overwriting a catalog function
doesn't seem to be a strong use case that we should be concerned
Hi Dawid,
Thank you for your summary. While the only difference in the two proposals
is one- or three-part in naming, the consequence would be substantial.
To me, there are two major use cases of temporary functions compared to
persistent ones:
1. Temporary in nature and auto managed by the
Hi Xuefu,
Thank you for your answers.
Let me summarize my understanding. In principle we differ only in
regards to the fact if a temporary function can be only 1-part or only
3-part identified. I can reconfirm that if the community decides it
prefers the 1-part approach I will commit to that,
Hi David,
Thanks for sharing your thoughts and request for clarifications. I believe
that I fully understood your proposal, which does has its merit. However,
it's different from ours. Here are the answers to your questions:
Re #1: yes, the temp functions in the proposal are global and have
Hi Xuefu,
Just wanted to summarize my opinion on the one topic (temporary functions).
My preference would be to make temporary functions always 3-part
qualified (as a result that would prohibit overriding built-in
functions). Having said that if the community decides that it's better
to allow
Maybe Xuefu missed my email. Please let me know what your thoughts are on
the summary, if there's still major controversy, I can take time to
reevaluate that part.
On Wed, Sep 4, 2019 at 2:25 PM Xuefu Z wrote:
> Thank all for the sharing thoughts. I think we have gathered some useful
> initial
Thank all for the sharing thoughts. I think we have gathered some useful
initial feedback from this long discussion with a couple of focal points
sticking out.
We will go back to do more research and adapt our proposal. Once it's
ready, we will ask for a new round of review. If there is any
Hi David,
Thanks for sharing the findings about temporary functions. Because of
strong inconsistency observed in Spark, we can probably ignore it for now.
For Hive, I understand one may not be able to overwrite everything, but the
capability is being offered.
Whether we offer this capability is
Let me try to summarize and conclude the long thread so far:
1. For order of temp function v.s. built-in function:
I think Dawid's point that temp function should be of fully qualified path
is a better reasoning to back the newly proposed order, and i agree we
don't need to follow Hive/Spark.
Hi,
Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
are very inconsistent in that manner (spark being way worse on that).
Hive:
You cannot overwrite all the built-in functions. I could overwrite most
of
Hi all,
thanks for the healthy discussion. It is already a very long discussion
with a lot of text. So I will just post my opinion to a couple of
statements:
> Hive built-in functions are not part of Flink built-in functions,
they are catalog functions
That is not entirely true. Correct
Hi all,
Regarding #1 temp function <> built-in function and naming.
I'm fine with temp functions should precede built-in function and can
override built-in functions (we already support to override built-in
function in 1.9).
If we don't allow the same name as a built-in function, I'm afraid we
Hi David,
Thank you for sharing your findings. It seems to me that there is no SQL
standard regarding temporary functions. There are few systems that support
it. Here are what I have found:
1. Hive: no DB qualifier allowed. Can overwrite built-in.
2. Spark: basically follows Hive (
Hi all,
Just an opinion on the built-in <> temporary functions resolution and
NAMING issue. I think we should not allow overriding the built-in
functions, as this may pose serious issues and to be honest is rather
not feasible and would require major rework. What happens if a user
wants to
Hi,
I agree with Xuefu that the main controversial points are mainly the two
places. My thoughts on them:
1) Determinism of referencing Hive built-in functions. We can either remove
Hive built-in functions from ambiguous function resolution and require
users to use special syntax for their
in the future?
Best,
Jingsong Lee
--
From:Kurt Young
Send Time:2019年9月4日(星期三) 10:11
To:dev
Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is the
>From what I have seen, there are a couple of focal disagreements:
1. Resolution order: temp function --> flink built-in function --> catalog
function vs flink built-in function --> temp function -> catalog function.
2. "External" built-in functions: how to treat built-in functions in
external
Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is the
same
as Bowen's. But after thinking about it, I'm currently lean to Timo's
suggestion.
The reason is backward compatibility. If we follow Bowen's approach, let's
say we
first find function in Flink's built-in functions, and
Hi all,
Thanks for the feedback. Just a kindly reminder that the [Proposal] section
in the google doc was updated, please take a look first and let me know if
you have more questions.
On Tue, Sep 3, 2019 at 4:57 PM Bowen Li wrote:
> Hi Timo,
>
> Re> 1) We should not have the restriction "hive
Hi Timo,
Re> 1) We should not have the restriction "hive built-in functions can only
> be used when current catalog is hive catalog". Switching a catalog
> should only have implications on the cat.db.object resolution but not
> functions. It would be quite convinient for users to use Hive
Hi Jingsong,
Re> 1.Hive built-in functions is an intermediate solution. So we should
> not introduce interfaces to influence the framework. To make
> Flink itself more powerful, we should implement the functions
> we need to add.
Yes, please see the doc.
Re> 2.Non-flink built-in functions are
Hi Kurt,
Re: > What I want to propose is we can merge #3 and #4, make them both under
>"catalog" concept, by extending catalog function to make it have ability to
>have built-in catalog functions. Some benefits I can see from this
approach:
>1. We don't have to introduce new concept like external
Hi Kurt,
it should not affect the functions and operations we currently have in
SQL. It just categorizes the available built-in functions. It is kind of
an orthogonal concept to the catalog API but built-in functions deserve
this special kind of treatment. CatalogFunction still fits perfectly
Does this only affect the functions and operations we currently have in SQL
and
have no effect on tables, right? Looks like this is an orthogonal concept
with Catalog?
If the answer are both yes, then the catalog function will be a weird
concept?
Best,
Kurt
On Tue, Sep 3, 2019 at 8:10 PM Danny
The way you proposed are basically the same as what Calcite does, I think we
are in the same line.
Best,
Danny Chan
在 2019年9月3日 +0800 PM7:57,Timo Walther ,写道:
> This sounds exactly as the module approach I mentioned, no?
>
> Regards,
> Timo
>
> On 03.09.19 13:42, Danny Chan wrote:
> > Thanks
This sounds exactly as the module approach I mentioned, no?
Regards,
Timo
On 03.09.19 13:42, Danny Chan wrote:
Thanks Bowen for bring up this topic, I think it’s a useful refactoring to make
our function usage more user friendly.
For the topic of how to organize the builtin operators and
Thanks Bowen for bring up this topic, I think it’s a useful refactoring to make
our function usage more user friendly.
For the topic of how to organize the builtin operators and operators of Hive,
here is a solution from Apache Calcite, the Calcite way is to make every
dialect operators a
white list?
Once we implement some functions to flink built-in, we can
also update the whitelist.
Best,
Jingsong Lee
--
From:Kurt Young
Send Time:2019年9月3日(星期二) 15:41
To:dev
Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatal
.
Best,
Jingsong Lee
--
From:Kurt Young
Send Time:2019年9月3日(星期二) 15:41
To:dev
Subject:Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog
Thanks Bowen for driving this.
+1 for the general idea. It makes the function resolved behavior more
Thanks Bowen for driving this.
+1 for the general idea. It makes the function resolved behavior more
clear and deterministic. Besides, the user can use all hive built-in
functions, which is a great feature.
I only have one comment, but maybe it may touch your design so I think
it would make
Thanks everyone for the feedback.
I have updated the document accordingly. Here're the summary of changes:
- clarify the concept of temporary functions, to facilitate deciding
function resolution order
- provide two options to support Hive built-in functions, with the 2nd one
being preferred
-
Hi folks,
I'd like to kick off a discussion on reworking Flink's FunctionCatalog.
It's critically helpful to improve function usability in SQL.
https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
In short, it:
- adds support for precise function
72 matches
Mail list logo