from:"Bowen Li"

Hi Kurt,

Re: > What I want to propose is we can merge #3 and #4, make them both under
>"catalog" concept, by extending catalog function to make it have ability to
>have built-in catalog functions. Some benefits I can see from this
approach:
>1. We don't have to introduce new concept like external built-in functions.
>Actually I don't see a full story about how to treat a built-in functions,
and it
>seems a little bit disrupt with catalog. As a result, you have to make
some restriction
>like "hive built-in functions can only be used when current catalog is
hive catalog".

Yes, I've unified #3 and #4 but it seems I didn't update some part of the
doc. I've modified those sections, and they are up to date now.

In short, now built-in function of external systems are defined as a
special kind of catalog function in Flink, and handled by Flink as
following:
- An external built-in function must be associated with a catalog for the
purpose of decoupling flink-table and external systems.
- It always resides in front of catalog functions in ambiguous function
reference order, just like in its own external system
- It is a special catalog function that doesn’t have a schema/database
namespace
- It goes thru the same instantiation logic as other user defined catalog
functions in the external system

Please take another look at the doc, and let me know if you have more
questions.


On Tue, Sep 3, 2019 at 7:28 AM Timo Walther  wrote:

> Hi Kurt,
>
> it should not affect the functions and operations we currently have in
> SQL. It just categorizes the available built-in functions. It is kind of
> an orthogonal concept to the catalog API but built-in functions deserve
> this special kind of treatment. CatalogFunction still fits perfectly in
> there because the regular catalog object resolution logic is not
> affected. So tables and functions are resolved in the same way but with
> built-in functions that have priority as in the original design.
>
> Regards,
> Timo
>
>
> On 03.09.19 15:26, Kurt Young wrote:
> > Does this only affect the functions and operations we currently have in
> SQL
> > and
> > have no effect on tables, right? Looks like this is an orthogonal concept
> > with Catalog?
> > If the answer are both yes, then the catalog function will be a weird
> > concept?
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan  wrote:
> >
> >> The way you proposed are basically the same as what Calcite does, I
> think
> >> we are in the same line.
> >>
> >> Best,
> >> Danny Chan
> >> 在 2019年9月3日 +0800 PM7:57，Timo Walther ，写道：
> >>> This sounds exactly as the module approach I mentioned, no?
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>> On 03.09.19 13:42, Danny Chan wrote:
> >>>> Thanks Bowen for bring up this topic, I think it’s a useful
> >> refactoring to make our function usage more user friendly.
> >>>> For the topic of how to organize the builtin operators and operators
> >> of Hive, here is a solution from Apache Calcite, the Calcite way is to
> make
> >> every dialect operators a “Library”, user can specify which libraries
> they
> >> want to use for a sql query. The builtin operators always comes as the
> >> first class objects and the others are used from the order they appears.
> >> Maybe you can take a reference.
> >>>> [1]
> >>
> https://github.com/apache/calcite/commit/9a4eab5240d96379431d14a1ac33bfebaf6fbb28
> >>>> Best,
> >>>> Danny Chan
> >>>> 在 2019年8月28日 +0800 AM2:50，Bowen Li ，写道：
> >>>>> Hi folks,
> >>>>>
> >>>>> I'd like to kick off a discussion on reworking Flink's
> >> FunctionCatalog.
> >>>>> It's critically helpful to improve function usability in SQL.
> >>>>>
> >>>>>
> >>
> https://docs.google.com/document/d/1w3HZGj9kry4RsKVCduWp82HkW6hhgi2unnvOAUS72t8/edit?usp=sharing
> >>>>> In short, it:
> >>>>> - adds support for precise function reference with fully/partially
> >>>>> qualified name
> >>>>> - redefines function resolution order for ambiguous function
> >> reference
> >>>>> - adds support for Hive's rich built-in functions (support for Hive
> >> user
> >>>>> defined functions was already added in 1.9.0)
> >>>>> - clarifies the concept of temporary functions
> >>>>>
> >>>>> Would love to hear your thoughts.
> >>>>>
> >>>>> Bowen
> >>>
>
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Hi Jingsong,

Re> 1.Hive built-in functions is an intermediate solution. So we should
> not introduce interfaces to influence the framework. To make
> Flink itself more powerful, we should implement the functions
> we need to add.

Yes, please see the doc.

Re> 2.Non-flink built-in functions are easy for users to change their
> behavior. If we support some flink built-in functions in the
> future but act differently from non-flink built-in, this will lead to
> changes in user behavior.

There's no such concept as "external built-in functions" any more. Built-in
functions of external systems will be treated as special catalog functions.

Re> Another question is, does this fallback include all
> hive built-in functions? As far as I know, some hive functions
> have some hacky. If possible, can we start with a white list?
> Once we implement some functions to flink built-in, we can
> also update the whitelist.

Yes, that's something we thought of too. I don't think it's super critical
to the scope of this FLIP, thus I'd like to leave it to future efforts as a
nice-to-have feature.


On Tue, Sep 3, 2019 at 1:37 PM Bowen Li  wrote:

> Hi Kurt,
>
> Re: > What I want to propose is we can merge #3 and #4, make them both
> under
> >"catalog" concept, by extending catalog function to make it have ability
> to
> >have built-in catalog functions. Some benefits I can see from this
> approach:
> >1. We don't have to introduce new concept like external built-in
> functions.
> >Actually I don't see a full story about how to treat a built-in
> functions, and it
> >seems a little bit disrupt with catalog. As a result, you have to make
> some restriction
> >like "hive built-in functions can only be used when current catalog is
> hive catalog".
>
> Yes, I've unified #3 and #4 but it seems I didn't update some part of the
> doc. I've modified those sections, and they are up to date now.
>
> In short, now built-in function of external systems are defined as a
> special kind of catalog function in Flink, and handled by Flink as
> following:
> - An external built-in function must be associated with a catalog for the
> purpose of decoupling flink-table and external systems.
> - It always resides in front of catalog functions in ambiguous function
> reference order, just like in its own external system
> - It is a special catalog function that doesn’t have a schema/database
> namespace
> - It goes thru the same instantiation logic as other user defined catalog
> functions in the external system
>
> Please take another look at the doc, and let me know if you have more
> questions.
>
>
> On Tue, Sep 3, 2019 at 7:28 AM Timo Walther  wrote:
>
>> Hi Kurt,
>>
>> it should not affect the functions and operations we currently have in
>> SQL. It just categorizes the available built-in functions. It is kind of
>> an orthogonal concept to the catalog API but built-in functions deserve
>> this special kind of treatment. CatalogFunction still fits perfectly in
>> there because the regular catalog object resolution logic is not
>> affected. So tables and functions are resolved in the same way but with
>> built-in functions that have priority as in the original design.
>>
>> Regards,
>> Timo
>>
>>
>> On 03.09.19 15:26, Kurt Young wrote:
>> > Does this only affect the functions and operations we currently have in
>> SQL
>> > and
>> > have no effect on tables, right? Looks like this is an orthogonal
>> concept
>> > with Catalog?
>> > If the answer are both yes, then the catalog function will be a weird
>> > concept?
>> >
>> > Best,
>> > Kurt
>> >
>> >
>> > On Tue, Sep 3, 2019 at 8:10 PM Danny Chan  wrote:
>> >
>> >> The way you proposed are basically the same as what Calcite does, I
>> think
>> >> we are in the same line.
>> >>
>> >> Best,
>> >> Danny Chan
>> >> 在 2019年9月3日 +0800 PM7:57，Timo Walther ，写道：
>> >>> This sounds exactly as the module approach I mentioned, no?
>> >>>
>> >>> Regards,
>> >>> Timo
>> >>>
>> >>> On 03.09.19 13:42, Danny Chan wrote:
>> >>>> Thanks Bowen for bring up this topic, I think it’s a useful
>> >> refactoring to make our function usage more user friendly.
>> >>>> For the topic of how to organize the builtin operators and operators
>> >> of Hive, here is a solution from Apache Calcite, the Calcite way is to
>> make
>&

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Hi Timo,

Re> 1) We should not have the restriction "hive built-in functions can only
> be used when current catalog is hive catalog". Switching a catalog
> should only have implications on the cat.db.object resolution but not
> functions. It would be quite convinient for users to use Hive built-ins
> even if they use a Confluent schema registry or just the in-memory
catalog.

There might be a misunderstanding here.

First of all, Hive built-in functions are not part of Flink built-in
functions, they are catalog functions, thus if the current catalog is not a
HiveCatalog but, say, a schema registry catalog, ambiguous functions
reference just shouldn't be resolved to a different catalog.

Second, Hive built-in functions can potentially be referenced across
catalog, but it doesn't have db namespace and we currently just don't have
a SQL syntax for it. It can be enabled when such a SQL syntax is defined,
e.g. "catalog::function", but it's out of scope of this FLIP.

2) I would propose to have separate concepts for catalog and built-in
functions. In particular it would be nice to modularize built-in
functions. Some built-in functions are very crucial (like AS, CAST,
MINUS), others are more optional but stable (MD5, CONCAT_WS), and maybe
we add more experimental functions in the future or function for some
special application area (Geo functions, ML functions). A data platform
team might not want to make every built-in function available. Or a
function module like ML functions is in a different Maven module.

I think this is orthogonal to this FLIP, especially we don't have the
"external built-in functions" anymore and currently the built-in function
category remains untouched.

But just to share some thoughts on the proposal, I'm not sure about it:
- I don't know if any other databases handle built-in functions like that.
Maybe you can give some examples? IMHO, built-in functions are system info
and should be deterministic, not depending on loaded libraries. Geo
functions should be either built-in already or just libraries functions,
and library functions can be adapted to catalog APIs or of some other
syntax to use
- I don't know if all use cases stand, and many can be achieved by other
approaches too. E.g. experimental functions can be taken good care of by
documentations, annotations, etc
- the proposal basically introduces some concept like a pluggable built-in
function catalog, despite the already existing catalog APIs
- it brings in even more complicated scenarios to the design. E.g. how do
you handle built-in functions in different modules but different names?

In short, I'm not sure if it really stands and it looks like an overkill to
me. I'd rather not go to that route. Related discussion can be on its own
thread.

3) Following the suggestion above, we can have a separate discovery
mechanism for built-in functions. Instead of just going through a static
list like in BuiltInFunctionDefinitions, a platform team should be able
to select function modules like
catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
HiveFunctions) or via service discovery;

Same as above. I'll leave it to its own thread.

re > 3) Dawid and I discussed the resulution order again. I agree with Kurt
> that we should unify built-in function (external or internal) under a
> common layer. However, the resolution order should be:
>   1. built-in functions
>   2. temporary functions
>   3. regular catalog resolution logic
> Otherwise a temporary function could cause clashes with Flink's built-in
> functions. If you take a look at other vendors, like SQL Server they
> also do not allow to overwrite built-in functions.

”I agree with Kurt that we should unify built-in function (external or
internal) under a common layer.“ <- I don't think this is what Kurt means.
Kurt and I are in favor of unifying built-in functions of external systems
and catalog functions. Did you type a mistake?

Besides, I'm not sure about the resolution order you proposed. Temporary
functions have a lifespan over a session and are only visible to the
session owner, they are unique to each user, and users create them on
purpose to be the highest priority in order to overwrite system info
(built-in functions in this case).

In your case, why would users name a temporary function the same as a
built-in function then? Since using that name in ambiguous function
reference will always be resolved to built-in functions, creating a
same-named temp function would be meaningless in the end.

On Tue, Sep 3, 2019 at 1:44 PM Bowen Li  wrote:

> Hi Jingsong,
>
> Re> 1.Hive built-in functions is an intermediate solution. So we should
> > not introduce interfaces to influence the framework. To make
> > Flink itself more powerful, we should implement the functions
> > we need to add.
>
> Y

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Hi all,

Thanks for the feedback. Just a kindly reminder that the [Proposal] section
in the google doc was updated, please take a look first and let me know if
you have more questions.

On Tue, Sep 3, 2019 at 4:57 PM Bowen Li  wrote:

> Hi Timo,
>
> Re> 1) We should not have the restriction "hive built-in functions can
> only
> > be used when current catalog is hive catalog". Switching a catalog
> > should only have implications on the cat.db.object resolution but not
> > functions. It would be quite convinient for users to use Hive built-ins
> > even if they use a Confluent schema registry or just the in-memory
> catalog.
>
> There might be a misunderstanding here.
>
> First of all, Hive built-in functions are not part of Flink built-in
> functions, they are catalog functions, thus if the current catalog is not a
> HiveCatalog but, say, a schema registry catalog, ambiguous functions
> reference just shouldn't be resolved to a different catalog.
>
> Second, Hive built-in functions can potentially be referenced across
> catalog, but it doesn't have db namespace and we currently just don't have
> a SQL syntax for it. It can be enabled when such a SQL syntax is defined,
> e.g. "catalog::function", but it's out of scope of this FLIP.
>
> 2) I would propose to have separate concepts for catalog and built-in
> functions. In particular it would be nice to modularize built-in
> functions. Some built-in functions are very crucial (like AS, CAST,
> MINUS), others are more optional but stable (MD5, CONCAT_WS), and maybe
> we add more experimental functions in the future or function for some
> special application area (Geo functions, ML functions). A data platform
> team might not want to make every built-in function available. Or a
> function module like ML functions is in a different Maven module.
>
> I think this is orthogonal to this FLIP, especially we don't have the
> "external built-in functions" anymore and currently the built-in function
> category remains untouched.
>
> But just to share some thoughts on the proposal, I'm not sure about it:
> - I don't know if any other databases handle built-in functions like that.
> Maybe you can give some examples? IMHO, built-in functions are system info
> and should be deterministic, not depending on loaded libraries. Geo
> functions should be either built-in already or just libraries functions,
> and library functions can be adapted to catalog APIs or of some other
> syntax to use
> - I don't know if all use cases stand, and many can be achieved by other
> approaches too. E.g. experimental functions can be taken good care of by
> documentations, annotations, etc
> - the proposal basically introduces some concept like a pluggable built-in
> function catalog, despite the already existing catalog APIs
> - it brings in even more complicated scenarios to the design. E.g. how do
> you handle built-in functions in different modules but different names?
>
> In short, I'm not sure if it really stands and it looks like an overkill
> to me. I'd rather not go to that route. Related discussion can be on its
> own thread.
>
> 3) Following the suggestion above, we can have a separate discovery
> mechanism for built-in functions. Instead of just going through a static
> list like in BuiltInFunctionDefinitions, a platform team should be able
> to select function modules like
> catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
> HiveFunctions) or via service discovery;
>
> Same as above. I'll leave it to its own thread.
>
> re > 3) Dawid and I discussed the resulution order again. I agree with
> Kurt
> > that we should unify built-in function (external or internal) under a
> > common layer. However, the resolution order should be:
> >   1. built-in functions
> >   2. temporary functions
> >   3. regular catalog resolution logic
> > Otherwise a temporary function could cause clashes with Flink's built-in
> > functions. If you take a look at other vendors, like SQL Server they
> > also do not allow to overwrite built-in functions.
>
> ”I agree with Kurt that we should unify built-in function (external or
> internal) under a common layer.“ <- I don't think this is what Kurt means.
> Kurt and I are in favor of unifying built-in functions of external systems
> and catalog functions. Did you type a mistake?
>
> Besides, I'm not sure about the resolution order you proposed. Temporary
> functions have a lifespan over a session and are only visible to the
> session owner, they are unique to each user, and users create them on
> purpose to be the highest priority in order to overwrite system in

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Hi,

I agree with Xuefu that the main controversial points are mainly the two
places. My thoughts on them:

1) Determinism of referencing Hive built-in functions. We can either remove
Hive built-in functions from ambiguous function resolution and require
users to use special syntax for their qualified names, or add a config flag
to catalog constructor/yaml for turning on and off Hive built-in functions
with the flag set to 'false' by default and proper doc added to help users
make their decisions.

2) Flink temp functions v.s. Flink built-in functions in ambiguous function
resolution order. We believe Flink temp functions should precede Flink
built-in functions, and I have presented my reasons. Just in case if we
cannot reach an agreement, I propose forbid users registering temp
functions in the same name as a built-in function, like MySQL's approach,
for the moment. It won't have any performance concern, since built-in
functions are all in memory and thus cost of a name check will be really
trivial.


On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z  wrote:

> From what I have seen, there are a couple of focal disagreements:
>
> 1. Resolution order: temp function --> flink built-in function --> catalog
> function vs flink built-in function --> temp function -> catalog function.
> 2. "External" built-in functions: how to treat built-in functions in
> external system and how users reference them
>
> For #1, I agree with Bowen that temp function needs to be at the highest
> priority because that's how a user might overwrite a built-in function
> without referencing a persistent, overwriting catalog function with a fully
> qualified name. Putting built-in functions at the highest priority
> eliminates that usage.
>
> For #2, I saw a general agreement on referencing "external" built-in
> functions such as those in Hive needs to be explicit and deterministic even
> though different approaches are proposed. To limit the scope and simply the
> usage, it seems making sense to me to introduce special syntax for user  to
> explicitly reference an external built-in function such as hive1::sqrt or
> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API call
> hive1.getFunction(ObjectPath functionName) where the database name is
> absent for bulit-in functions available in that catalog hive1. I understand
> that Bowen's original proposal was trying to avoid this, but this could
> turn out to be a clean and simple solution.
>
> (Timo's modular approach is great way to "expand" Flink's built-in function
> set, which seems orthogonal and complementary to this, which could be
> tackled in further future work.)
>
> I'd be happy to hear further thoughts on the two points.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young  wrote:
>
> > Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is the
> > same
> > as Bowen's. But after thinking about it, I'm currently lean to Timo's
> > suggestion.
> >
> > The reason is backward compatibility. If we follow Bowen's approach,
> let's
> > say we
> > first find function in Flink's built-in functions, and then hive's
> > built-in. For example, `foo`
> > is not supported by Flink, but hive has such built-in function. So user
> > will have hive's
> > behavior for function `foo`. And in next release, Flink realize this is a
> > very popular function
> > and add it into Flink's built-in functions, but with different behavior
> as
> > hive's. So in next
> > release, the behavior changes.
> >
> > With Timo's approach, IIUC user have to tell the framework explicitly
> what
> > kind of
> > built-in functions he would like to use. He can just tell framework to
> > abandon Flink's built-in
> > functions, and use hive's instead. User can only choose between them, but
> > not use
> > them at the same time. I think this approach is more predictable.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Sep 4, 2019 at 8:00 AM Bowen Li  wrote:
> >
> > > Hi all,
> > >
> > > Thanks for the feedback. Just a kindly reminder that the [Proposal]
> > section
> > > in the google doc was updated, please take a look first and let me know
> > if
> > > you have more questions.
> > >
> > > On Tue, Sep 3, 2019 at 4:57 PM Bowen Li  wrote:
> > >
> > > > Hi Timo,
> > > >
> > > > Re> 1) We should not have the restriction "hive built-in functions
> can
> > > > only
> > > > > be used when

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-04 Thread Bowen Li

or. (
> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html)
>
> Because of lack of standard, it's perfectly fine for Flink to define
> whatever it sees appropriate. Thus, your proposal (no overwriting and must
> have DB as holder) is one option. The advantage is simplicity, The
> downside
> is the deviation from Hive, which is popular and de facto standard in big
> data world.
>
> However, I don't think we have to follow Hive. More importantly, we need a
> consensus. I have no objection if your proposal is generally agreed upon.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz 
> 
> wrote:
>
> Hi all,
>
> Just an opinion on the built-in <> temporary functions resolution and
> NAMING issue. I think we should not allow overriding the built-in
> functions, as this may pose serious issues and to be honest is rather
> not feasible and would require major rework. What happens if a user
> wants to override CAST? Calls to that function are generated at
> different layers of the stack that unfortunately does not always go
> through the Catalog API (at least yet). Moreover from what I've checked
> no other systems allow overriding the built-in functions. All the
> systems I've checked so far register temporary functions in a
> database/schema (either special database for temporary functions, or
> just current database). What I would suggest is to always register
> temporary functions with a 3 part identifier. The same way as tables,
> views etc. This effectively means you cannot override built-in
> functions. With such approach it is natural that the temporary functions
> end up a step lower in the resolution order:
>
> 1. built-in functions (1 part, maybe 2? - this is still under discussion)
>
> 2. temporary functions (always 3 part path)
>
> 3. catalog functions (always 3 part path)
>
> Let me know what do you think.
>
> Best,
>
> Dawid
>
> On 04/09/2019 06:13, Bowen Li wrote:
>
> Hi,
>
> I agree with Xuefu that the main controversial points are mainly the
>
> two
>
> places. My thoughts on them:
>
> 1) Determinism of referencing Hive built-in functions. We can either
>
> remove
>
> Hive built-in functions from ambiguous function resolution and require
> users to use special syntax for their qualified names, or add a config
>
> flag
>
> to catalog constructor/yaml for turning on and off Hive built-in
>
> functions
>
> with the flag set to 'false' by default and proper doc added to help
>
> users
>
> make their decisions.
>
> 2) Flink temp functions v.s. Flink built-in functions in ambiguous
>
> function
>
> resolution order. We believe Flink temp functions should precede Flink
> built-in functions, and I have presented my reasons. Just in case if we
> cannot reach an agreement, I propose forbid users registering temp
> functions in the same name as a built-in function, like MySQL's
>
> approach,
>
> for the moment. It won't have any performance concern, since built-in
> functions are all in memory and thus cost of a name check will be
>
> really
>
> trivial.
>
>
> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z 
>  wrote:
>
>  From what I have seen, there are a couple of focal disagreements:
>
> 1. Resolution order: temp function --> flink built-in function -->
>
> catalog
>
> function vs flink built-in function --> temp function -> catalog
>
> function.
>
> 2. "External" built-in functions: how to treat built-in functions in
> external system and how users reference them
>
> For #1, I agree with Bowen that temp function needs to be at the
>
> highest
>
> priority because that's how a user might overwrite a built-in function
> without referencing a persistent, overwriting catalog function with a
>
> fully
>
> qualified name. Putting built-in functions at the highest priority
> eliminates that usage.
>
> For #2, I saw a general agreement on referencing "external" built-in
> functions such as those in Hive needs to be explicit and deterministic
>
> even
>
> though different approaches are proposed. To limit the scope and
>
> simply
>
> the
>
> usage, it seems making sense to me to introduce special syntax for
>
> user  to
>
> explicitly reference an external built-in function such as hive1::sqrt
>
> or
>
> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API
>
> call
>
> hive1.getFunction(ObjectPath functionName) where the database name is
> absent for bulit-in functions available in that catalog hive1. I
>
> understand
>
> that

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-04 Thread Bowen Li

Maybe Xuefu missed my email. Please let me know what your thoughts are on
the summary, if there's still major controversy, I can take time to
reevaluate that part.


On Wed, Sep 4, 2019 at 2:25 PM Xuefu Z  wrote:

> Thank all for the sharing thoughts. I think we have gathered some useful
> initial feedback from this long discussion with a couple of focal points
> sticking out.
>
>  We will go back to do more research and adapt our proposal. Once it's
> ready, we will ask for a new round of review. If there is any disagreement,
> we will start a new discussion thread on each rather than having a mega
> discussion like this.
>
> Thanks to everyone for participating.
>
> Regards,
> Xuefu
>
>
> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li  wrote:
>
> > Let me try to summarize and conclude the long thread so far:
> >
> > 1. For order of temp function v.s. built-in function:
> >
> > I think Dawid's point that temp function should be of fully qualified
> path
> > is a better reasoning to back the newly proposed order, and i agree we
> > don't need to follow Hive/Spark.
> >
> > However, I'd rather not change fundamentals of temporary functions in
> this
> > FLIP. It belongs to a bigger story of how temporary objects should be
> > redefined and be handled uniformly - currently temporary tables and views
> > (those registered from TableEnv#registerTable()) behave different than
> what
> > Dawid propose for temp functions, and we need a FLIP to just unify their
> > APIs and behaviors.
> >
> > I agree that backward compatibility is not an issue w.r.t Jark's points.
> >
> > ***Seems we do have consensus that it's acceptable to prevent users
> > registering a temp function in the same name as a built-in function. To
> > help us move forward, I'd like to propose setting such a restraint on
> temp
> > functions in this FLIP to simplify the design and avoid disputes.*** It
> > will also leave rooms for improvements in the future.
> >
> >
> > 2. For Hive built-in function:
> >
> > Thanks Timo for providing the Presto and Postgres examples. I feel
> modular
> > built-in functions can be a good fit for the geo and ml example as a
> native
> > Flink extension, but not sure if it fits well with external integrations.
> > Anyway, I think modular built-in functions is a bigger story and can be
> on
> > its own thread too, and our proposal doesn't prevent Flink from doing
> that
> > in the future.
> >
> > ***Seems we have consensus that users should be able to use built-in
> > functions of Hive or other external systems in SQL explicitly and
> > deterministically regardless of Flink built-in functions and the
> potential
> > modular built-in functions, via some new syntax like "mycat::func"? If
> so,
> > I'd like to propose removing Hive built-in functions from ambiguous
> > function resolution order, and empower users with such a syntax. This way
> > we sacrifice a little convenience for certainty***
> >
> >
> > What do you think?
> >
> > On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz 
> > wrote:
> >
> > > Hi,
> > >
> > > Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> > > performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
> > are
> > > very inconsistent in that manner (spark being way worse on that).
> > >
> > > Hive:
> > >
> > > You cannot overwrite all the built-in functions. I could overwrite most
> > of
> > > the functions I tried e.g. length, e, pi, round, rtrim, but there are
> > > functions I cannot overwrite e.g. CAST, ARRAY I get:
> > >
> > >
> > > *ParseException line 1:29 cannot recognize input near 'array' 'AS'
> *
> > >
> > > What is interesting is that I cannot ovewrite *array*, but I can
> ovewrite
> > > *map* or *struct*. Though hive behaves reasonable well if I manage to
> > > overwrite a function. When I drop the temporary function the native
> > > function is still available.
> > >
> > > Spark:
> > >
> > > Spark's behavior imho is super bad.
> > >
> > > Theoretically I could overwrite all functions. I was able e.g. to
> > > overwrite CAST function. I had to use though CREATE OR REPLACE
> TEMPORARY
> > > FUNCTION syntax. Otherwise I get an exception that a function already
> > > exists. However when I used the CAST function in a query it used the

Re: [DISCUSS] Contribute Pulsar Flink connector back to Flink

2019-09-05 Thread Bowen Li

Hi,

I think having a Pulsar connector in Flink can be a good mutual benefit to
both communities.

Another perspective is that Pulsar connector is the 1st streaming connector
that integrates with Flink's metadata management system and Catalog APIs.
It'll be cool to see how the integration turns out and whether we need to
improve Flink Catalog stack, which are currently in Beta, to cater to
streaming source/sink. Thus I'm in favor of merging Pulsar connector into
Flink 1.10.

I'd suggest to submit smaller sized PRs, e.g. maybe one for basic
source/sink functionalities and another for schema and catalog integration,
just to make them easier to review.

It doesn't seem to hurt to wait for FLIP-27. But I don't think FLIP-27
should be a blocker in cases where it cannot make its way into 1.10 or
doesn't leave reasonable amount of time for committers to review or for
Pulsar connector to fully adapt to new interfaces.

Bowen



On Thu, Sep 5, 2019 at 3:21 AM Becket Qin  wrote:

> Hi Till,
>
> You are right. It all depends on when the new source interface is going to
> be ready. Personally I think it would be there in about a month or so. But
> I could be too optimistic. It would also be good to hear what do Aljoscha
> and Stephan think as they are also involved in FLIP-27.
>
> In general I think we should have Pulsar connector in Flink 1.10,
> preferably with the new source interface. We can also check it in right now
> with old source interface, but I suspect few users will use it before the
> next official release. Therefore, it seems reasonable to wait a little bit
> to see whether we can jump to the new source interface. As long as we make
> sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt much.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann  wrote:
>
> > Hi everyone,
> >
> > I'm wondering what the problem would be if we committed the Pulsar
> > connector before the new source interface is ready. If I understood it
> > correctly, then we need to support the old source interface anyway for
> the
> > existing connectors. By checking it in early I could see the benefit that
> > our users could start using the connector earlier. Moreover, it would
> > prevent that the Pulsar integration is being delayed in case that the
> > source interface should be delayed. The only downside I see is the extra
> > review effort and potential fixes which might be irrelevant for the new
> > source interface implementation. I guess it mainly depends on how certain
> > we are when the new source interface will be ready.
> >
> > Cheers,
> > Till
> >
> > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin  wrote:
> >
> > > Hi Sijie and Yijie,
> > >
> > > Thanks for sharing your thoughts.
> > >
> > > Just want to have some update on FLIP-27. Although the FLIP wiki and
> > > discussion thread has been quiet for some time, a few committer /
> > > contributors in Flink community were actually prototyping the entire
> > thing.
> > > We have made some good progress there but want to update the FLIP wiki
> > > after the entire thing is verified to work in case there are some last
> > > minute surprise in the implementation. I don't have an exact ETA yet,
> > but I
> > > guess it is going to be within a month or so.
> > >
> > > I am happy to review the current Flink Pulsar connector and see if it
> > would
> > > fit in FLIP-27. It would be good to avoid the case that we checked in
> the
> > > Pulsar connector with some review efforts and shortly after that the
> new
> > > Source interface is ready.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen 
> > > wrote:
> > >
> > > > Thanks for all the feedback and suggestions!
> > > >
> > > > As Sijie said, the goal of the connector has always been to provide
> > > > users with the latest features of both systems as soon as possible.
> We
> > > > propose to contribute the connector to Flink and hope to get more
> > > > suggestions and feedback from Flink experts to ensure the high
> quality
> > > > of the connector.
> > > >
> > > > For FLIP-27, we noticed its existence at the beginning of reworking
> > > > the connector implementation based on Flink 1.9; we also wanted to
> > > > build a connector that supports both batch and stream computing based
> > > > on it.
> > > > However, it has been inactive for some time, so we decided to provide
> > > > a connector with most of the new features, such as the new type
> system
> > > > and the new catalog API first. We will pay attention to the progress
> > > > of FLIP-27 continually and incorporate it with the connector as soon
> > > > as possible.
> > > >
> > > > Regarding the test status of the connector, we are following the
> other
> > > > connectors' test in Flink repository and aimed to provide throughout
> > > > tests as we could. We are also happy to hear suggestions and
> > > > supervision from the Flink community to improve the stability and
> >

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-08 Thread Bowen Li

you shadow only function 'func' in database 'db' in current catalog?
>3. This point is still under discussion, but was mentioned a few
>times, that maybe we want to enable syntax cat.func for "external built-in
>functions". How would that affect statement from previous point? Would
>'db.func' shadow "external built-in function" in 'db' catalog or user
>functions as in point 2? Or maybe both?
>4. Lastly in fact to summarize the previous points. Assuming 2/3-part
>paths. Would the function resolution be actually as follows?:
>   1. temporary functions (1-part path)
>   2. built-in functions
>   3. temporary functions (2-part path)
>   4. 2-part catalog functions a.k.a. "external built-in functions"
>   (cat + func) - this is still under discussion, if we want that in the 
> other
>   focal point
>   5. temporary functions (3-part path)
>   6. 3-part catalog functions a.k.a. user functions
>
> I would be really grateful if you could explain me those questions, thanks.
>
> BTW, Thank you all for a healthy discussion.
>
> Best,
>
> Dawid
> On 04/09/2019 23:25, Xuefu Z wrote:
>
> Thank all for the sharing thoughts. I think we have gathered some useful
> initial feedback from this long discussion with a couple of focal points
> sticking out.
>
>  We will go back to do more research and adapt our proposal. Once it's
> ready, we will ask for a new round of review. If there is any disagreement,
> we will start a new discussion thread on each rather than having a mega
> discussion like this.
>
> Thanks to everyone for participating.
>
> Regards,
> Xuefu
>
>
> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li  
>
>
>  wrote:
>
>
> Let me try to summarize and conclude the long thread so far:
>
> 1. For order of temp function v.s. built-in function:
>
> I think Dawid's point that temp function should be of fully qualified path
> is a better reasoning to back the newly proposed order, and i agree we
> don't need to follow Hive/Spark.
>
> However, I'd rather not change fundamentals of temporary functions in this
> FLIP. It belongs to a bigger story of how temporary objects should be
> redefined and be handled uniformly - currently temporary tables and views
> (those registered from TableEnv#registerTable()) behave different than what
> Dawid propose for temp functions, and we need a FLIP to just unify their
> APIs and behaviors.
>
> I agree that backward compatibility is not an issue w.r.t Jark's points.
>
> ***Seems we do have consensus that it's acceptable to prevent users
> registering a temp function in the same name as a built-in function. To
> help us move forward, I'd like to propose setting such a restraint on temp
> functions in this FLIP to simplify the design and avoid disputes.*** It
> will also leave rooms for improvements in the future.
>
>
> 2. For Hive built-in function:
>
> Thanks Timo for providing the Presto and Postgres examples. I feel modular
> built-in functions can be a good fit for the geo and ml example as a native
> Flink extension, but not sure if it fits well with external integrations.
> Anyway, I think modular built-in functions is a bigger story and can be on
> its own thread too, and our proposal doesn't prevent Flink from doing that
> in the future.
>
> ***Seems we have consensus that users should be able to use built-in
> functions of Hive or other external systems in SQL explicitly and
> deterministically regardless of Flink built-in functions and the potential
> modular built-in functions, via some new syntax like "mycat::func"? If so,
> I'd like to propose removing Hive built-in functions from ambiguous
> function resolution order, and empower users with such a syntax. This way
> we sacrifice a little convenience for certainty***
>
>
> What do you think?
>
> On Wed, Sep 4, 2019 at 7:02 AM Dawid Wysakowicz  
>
>
> 
> wrote:
>
>
> Hi,
>
> Regarding the Hive & Spark support of TEMPORARY FUNCTIONS. I've just
> performed some experiments (hive-2.3.2 & spark 2.4.4) and I think they
>
> are
>
> very inconsistent in that manner (spark being way worse on that).
>
> Hive:
>
> You cannot overwrite all the built-in functions. I could overwrite most
>
> of
>
> the functions I tried e.g. length, e, pi, round, rtrim, but there are
> functions I cannot overwrite e.g. CAST, ARRAY I get:
>
>
> *ParseException line 1:29 cannot recognize input near 'array' 'AS' *
>
> What is interesting is that I cannot ovewrite *array*,

[DISCUSS] modular built-in functions

2019-09-09 Thread Bowen Li

Hi all,

During the discussion of how to support Hive built-in functions in Flink in
FLIP-57 [1], an idea of "modular built-in functions" was brought up with
examples of "Extension" in Postgres [2] and "Plugin" in Presto [3]. Thus
I'd like to kick off a discussion to see if we should adopt such an
approach.

I try to summarize basics of the idea:
- functions from modules (e.g. Geo, ML) can be loaded into Flink as
built-in functions
- modules can be configured with order, discovered using SPI or set via
code like "catalogManager.setFunctionModules(CoreFunctions, GeoFunctions,
HiveFunctions)"
- built-in functions from external systems, like Hive, can be packaged
into such a module

I took time and researched Presto Plugin and Postgres Extension, and here
are some of my findings.

Presto:
- "Presto's Catalog associated with a connector, and a catalog only
contains schemas and references a data source via a connector." [4] A
Presto catalog doesn't have the concept of catalog functions, thus all
Presto functions don't have namespaces. Neither does Presto have function
DDL [5].
- Plugin are not specific to functions - "Plugins can provide
additional Connectors, Types, Functions, and System Access Control" [6]
- Thus, I feel a Plugin in Presto acts more as a "catalog" which is
similar to catalogs in Flink. Since all Presto functions don't have
namespaces, it probably can be seen as a built-in function module.

Postgres:
- Postgres extension is always installed to a schema, not the entire
cluster. There's a "schema_name" param in extension creation DDL - "The
name of the schema in which to install the extension's objects, given that
the extension allows its contents to be relocated. The named schema must
already exist. If not specified, and the extension's control file does not
specify a schema either, the current default object creation schema is
used." [7] Thus it also acts as "catalog" for schema, and thus functions
in extension are not built-in functions to Postgres.

Therefore, I feel the examples are not exactly the "built-in function
modules" that were brought up, but feel free to correct me if I'm wrong.

Going back to the idea itself, besides it seems to be a simpler concept and
design in some ways, I have two concerns:
1. The major one is still on name resolution - how to deal with name
collisions?
- Not allowing duplicated name won't work for Hive built-in functions
as many of them are dup named with Flink's, so we must allow modules
containing same named functions to be registered
- One assumption of this approach seems to be, given modules are
specified in order, functions from modules can be overrode according to the
order?
- If so, how can users reference a function that is overrode in the
above case (E.g. I may want to switch KMEANS between modules ML1 and ML2
with different implementations)?
- If it's supported, it seems we still need some new syntax?
- If it's not supported, that seems to be a major limitation for
users
2. The minor one is, allowing built-in functions from external system to be
accessed within Flink so widely can bring performance issue to users' jobs
- Unlike the potential native Flink Geo or ML functions, built-in
functions from external systems come with a pretty big performance penalty
in Flink due to data conversions and different invocation mechanism.
Supporting Hive built-in functions is mainly for simplifying migration from
Hive. I'm not sure if it makes sense when a user job has nothing to do with
Hive data but unintentionally ends up using Hive built-in functions without
knowing it's penalized on performance. Though doc can help to some extent,
not all users really read docs in detail.

An alternative is to treat "function modules" as catalog.
- For Flink native function modules like Geo or ML, they can be discovered
and registered automatically at runtime with a predefined catalog name in
itself, like "ml" or "ml1", which should be unique. Their functions are
considered as built-in functions to the catalog, and can be referenced, in
some new syntax like "catalog::func", as "ml:kmeans" and "ml1:kmeans".
- For built-in functions from external systems (e.g. Hive), they have to be
referenced either as "catalog::func" to make sure users are explicitly
expecting those external functions, or as complementary built-in functions
to Flink if a config "enable_hive_built_in_functions" in HiveCatalog is
turned on.

Either approach seems to have its own benefits, and I'm open for discussion
and would like to hear others' opinions and use cases where a specific
solution is required.

Thanks,
Bowen

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html
[2] https://www.postgresql.org/docs/10/extend-extensions.html
[3] https://prestodb.github.io/docs/current/develop/functions.html
[4]
https://prestodb.github.io/docs/current/overview/concepts.html#data-sources

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-11 Thread Bowen Li

Hi,

Thanks @Fabian @Dawid and everyone else for sharing your thoughts!

First, I'd like to take Hive built-in functions out of this FLIP to keep
our original scope and make it less controversial on a potential modular
approach. I will remove Hive built-in functions from the google doc.

Then the focus of debate is mainly function resolution order and temp
function namespace, which are somewhat related. I roughly summarized this
thread, and currently we are debating on two approaches with preference
from the following people:

Option 1:
Proposal: temp functions will be of 1-part path (function name only),
and can override built-in functions. The ambiguous function resolution
order is thus 1) temp functions 2) built-in functions 3) catalog functions
in the current catalog/database
Votes: Xuefu, Bowen, Fabian, Jark

Option 2:
Proposal: temp functions will be of 3-part path (with catalog,
database, and function name), and temp functions cannot override built-in
functions. The ambiguous function resolution order is thus 1) built-in
functions 2) temp functions (in 3-part path) 3) catalog functions in the
current catalog/database
Votes:  Dawid, Timo


Do you think we need a separate voting thread on the two options in the
community, or are we able to conclude from the above summary?



On Wed, Sep 11, 2019 at 8:09 AM Dawid Wysakowicz 
wrote:

> Hi Fabian,
> Thank you for your response.
> Regarding the temporary function, just wanted to clarify one thing: the
> 3-part identifier does not mean the user always has to provide the catalog
> & database explicitly. The same way user does not have to provide them in
> e.g. when creating permanent table, view etc. It means though functions are
> always stored within a database. The same way as all the permanent objects
> and other temporary objects(tables, views). If not given explicitly the
> current catalog & database would be used, both in the create statement or
> when using the function.
>
> Point taken though your preference would be to support overriding built-in
> functions.
>
> Best,
> Dawid
>
> On Wed, 11 Sep 2019, 21:14 Fabian Hueske,  wrote:
>
> > Hi all,
> >
> > I'd like to add my opinion on this topic as well ;-)
> >
> > In general, I think overriding built-in function with temp functions has
> a
> > couple of benefits but also a few challenges:
> >
> > * Users can reimplement the behavior of a built-in functions of a
> different
> > system, e.g., for backward compatibility after a migration.
> > * I don't think that "accidental" overrides and surprising semantics are
> an
> > issue or dangerous. The user registered the temp function in the same
> > session and should therefore be aware of the changed semantics.
> > * I see that not all built-in functions can be overridden, like the CAST
> > example that Dawid gave. However, I think these should be a small
> fraction
> > and such functions could be blacklisted. Sure, that's not super
> consistent,
> > but should (IMO) not be a big issue in practice.
> > * Temp functions should be easy to use. Requiring a 3-part addressing
> makes
> > them a lot less user friendly, IMO. Users need to think about what
> catalog
> > and db to choose when registering them. Also using a temp function in a
> > query becomes less convenient. Moreover, I agree with Bowen's concerns
> that
> > a 3-part addressing scheme reduces the temporal appearance of the
> function.
> >
> > From the three possible solutions, my preference order is
> > 1) 1-part address with override of built-in
> > 2) 1-part address without override of built-in
> > 3) 3-part address
> >
> > Regarding the issue of external built-in functions, I don't think that
> > Timo's proposal of modules is fully orthogonal to this discussion.
> > A Hive function module could be an alternative to offering Hive functions
> > as part of Hive's catalog.
> > From a user's point of view, I think that modules would be a "cleaner"
> > integration ("Why do I need a Hive catalog if all I want to do is apply a
> > Hive function on a Kafka table?").
> > However, the module approach clearly has the problem of dealing with
> > same-named functions in different modules (e.g., a Hive function and a
> > Flink built-in function).
> > The catalog approach as the benefit that functions can be addressed like
> > hiveCat::func (or a similar path).
> >
> > I'm not sure what's the best solution here.
> >
> > Cheers,
> > Fabian
> >
> >
> > Am Mo., 9. Sept. 2019 um 06:30 Uhr schrieb Bowen Li  >:
> >
> > > Hi,
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-13 Thread Bowen Li

Hi Fabian,

Yes, I agree 1-part/no-override is the least favorable thus I didn't
include that as a voting option, and the discussion is mainly between
1-part/override builtin and 3-part/not override builtin.

Re > However, it means that temp functions are differently treated than
other db objects.
IMO, the treatment difference results from the fact that functions are a
bit different from other objects - Flink don't have any other built-in
objects (tables, views) except functions.

Cheers,
Bowen

[DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

2019-09-17 Thread Bowen Li

Hi devs,

We'd like to kick off a conversation on "FLIP-68:  Extend Core Table System
with Modular Plugins" [1].

The modular approach was raised in discussion of how to support Hive
built-in functions in FLIP-57 [2]. As we discussed and looked deeper, we
think it’s a good opportunity to broaden the design and the corresponding
problem it aims to solve. The motivation is to expand Flink’s core table
system and enable users to do customizations by writing pluggable modules.

There are two aspects of the motivation:
1. Enpower users to write code and do customized developement for Flink
table core
2. Enable users to integrate Flink with cores and built-in objects of other
systems, so users can reuse what they are familiar with in other SQL
systems seamlessly as core and built-ins of Flink table

Please take a look, and feedbacks are welcome.

Bowen

[1]
https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit?usp=sharing
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Bowen Li

ers should be able to override all catalog objects consistently
> >> according
> >> > to FLIP-64 (Support for Temporary Objects in Table module). If
> functions
> >> > are treated completely different, we need more code and special cases.
> >> From
> >> > an implementation perspective, this topic only affects the lookup
> logic
> >> > which is rather low implementation effort which is why I would like to
> >> > clarify the remaining items. As you said, we have a slight consenus on
> >> > overriding built-in functions; we should also strive for reaching
> >> consensus
> >> > on the remaining topics.
> >> > >>
> >> > >> @Dawid: I like your idea as it ensures registering catalog objects
> >> > consistent and the overriding of built-in functions more explicit.
> >> > >>
> >> > >> Thanks,
> >> > >> Timo
> >> > >>
> >> > >>
> >> > >> On 17.09.19 11:59, kai wang wrote:
> >> > >>> hi, everyone
> >> > >>> I think this flip is very meaningful. it supports functions that
> >> can be
> >> > >>> shared by different catalogs and dbs, reducing the duplication of
> >> > functions.
> >> > >>>
> >> > >>> Our group based on flink's sql parser module implements create
> >> function
> >> > >>> feature, stores the parsed function metadata and schema into
> mysql,
> >> and
> >> > >>> also customizes the catalog, customizes sql-client to support
> custom
> >> > >>> schemas and functions. Loaded, but the function is currently
> global,
> >> > and is
> >> > >>> not subdivided according to catalog and db.
> >> > >>>
> >> > >>> In addition, I very much hope to participate in the development of
> >> this
> >> > >>> flip, I have been paying attention to the community, but found it
> is
> >> > more
> >> > >>> difficult to join.
> >> > >>> thank you.
> >> > >>>
> >> > >>> Xuefu Z  于2019年9月17日周二 上午11:19写道：
> >> > >>>
> >> > >>>> Thanks to Tmo and Dawid for sharing thoughts.
> >> > >>>>
> >> > >>>> It seems to me that there is a general consensus on having temp
> >> > functions
> >> > >>>> that have no namespaces and overwrite built-in functions. (As a
> >> side
> >> > note
> >> > >>>> for comparability, the current user defined functions are all
> >> > temporary and
> >> > >>>> having no namespaces.)
> >> > >>>>
> >> > >>>> Nevertheless, I can also see the merit of having namespaced temp
> >> > functions
> >> > >>>> that can overwrite functions defined in a specific cat/db.
> However,
> >> > this
> >> > >>>> idea appears orthogonal to the former and can be added
> >> incrementally.
> >> > >>>>
> >> > >>>> How about we first implement non-namespaced temp functions now
> and
> >> > leave
> >> > >>>> the door open for namespaced ones for later releases as the
> >> > requirement
> >> > >>>> might become more crystal? This also helps shorten the debate and
> >> > allow us
> >> > >>>> to make some progress along this direction.
> >> > >>>>
> >> > >>>> As to Dawid's idea of having a dedicated cat/db to host the
> >> temporary
> >> > temp
> >> > >>>> functions that don't have namespaces, my only concern is the
> >> special
> >> > >>>> treatment for a cat/db, which makes code less clean, as evident
> in
> >> > treating
> >> > >>>> the built-in catalog currently.
> >> > >>>>
> >> > >>>> Thanks,
> >> > >>>> Xuefiu
> >> > >>>>
> >> > >>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
> >> > >>>> wysakowicz.da...@gmail.com>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>> Hi,
&

Re: [VOTE] Improve TableFactory to add Context

2020-02-05 Thread Bowen Li

+1, LGTM

On Tue, Feb 4, 2020 at 11:28 PM Jark Wu  wrote:

> +1 form my side.
> Thanks for driving this.
>
> Btw, could you also attach a JIRA issue with the changes described in it,
> so that users can find the issue through the mailing list in the future.
>
> Best,
> Jark
>
> On Wed, 5 Feb 2020 at 13:38, Kurt Young  wrote:
>
> > +1 from my side.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Feb 5, 2020 at 10:59 AM Jingsong Li 
> > wrote:
> >
> > > Hi all,
> > >
> > > Interface updated.
> > > Please re-vote.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Tue, Feb 4, 2020 at 1:28 PM Jingsong Li 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to start the vote for the improve of
> > > > TableFactory, which is discussed and
> > > > reached a consensus in the discussion thread[2].
> > > >
> > > > The vote will be open for at least 72 hours. I'll try to close it
> > > > unless there is an objection or not enough votes.
> > > >
> > > > [1]
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-TableFactory-td36647.html
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> >
>

Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

2020-02-17 Thread Bowen Li

Hi all,

If there's no more comments, I would like to kick off a vote for this FLIP
[1].

FYI, the flip number is changed to 93 since there was a race condition of
taking 92.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog

On Wed, Jan 22, 2020 at 11:05 AM Bowen Li  wrote:

> Hi Flavio,
>
> First, this is a generic question on how flink-jdbc is set up, not
> specific to jdbc catalog, thus is better to be on its own thread.
>
> But to just quickly answer your question, you need to see where the
> incompatibility is. There may be incompatibility on 1) jdbc drivers and 2)
> the databases. 1) is fairly stable and back-compatible. 2) normally has
> things to do with your queries, not the driver.
>
>
>
> On Tue, Jan 21, 2020 at 3:21 PM Flavio Pompermaier 
> wrote:
>
>> Hi all,
>> I'm happy to see a lot of interest in easing the integration with JDBC
>> data
>> sources. Maybe this could be a rare situation (not in my experience
>> however..) but what if I have to connect to the same type of source (e.g.
>> Mysql) with 2 incompatible version...? How can I load the 2 (or more)
>> connectors jars without causing conflicts?
>>
>> Il Mar 14 Gen 2020, 23:32 Bowen Li  ha scritto:
>>
>> > Hi devs,
>> >
>> > I've updated the wiki according to feedbacks. Please take another look.
>> >
>> > Thanks!
>> >
>> >
>> > On Fri, Jan 10, 2020 at 2:24 PM Bowen Li  wrote:
>> >
>> > > Thanks everyone for the prompt feedback. Please see my response below.
>> > >
>> > > > In Postgress, the TIME/TIMESTAMP WITH TIME ZONE has the
>> > > java.time.Instant semantic, and should be mapped to Flink's
>> > TIME/TIMESTAMP
>> > > WITH LOCAL TIME ZONE
>> > >
>> > > Zhenghua, you are right that pg's 'timestamp with timezone' should be
>> > > translated into flink's 'timestamp with local timezone'. I don't find
>> > 'time
>> > > with (local) timezone' though, so we may not support that type from
>> pg in
>> > > Flink.
>> > >
>> > > > I suggest that the parameters can be completely consistent with the
>> > > JDBCTableSource / JDBCTableSink. If you take a look to JDBC api:
>> > > "DriverManager.getConnection".
>> > > That allow "default db, username, pwd" things optional. They can
>> included
>> > > in URL. Of course JDBC api also allows establishing connections to
>> > > different databases in a db instance. So I think we don't need
>> provide a
>> > > "base_url", we can just provide a real "url". To be consistent with
>> JDBC
>> > > api.
>> > >
>> > > Jingsong, what I'm saying is a builder can be added on demand later if
>> > > there's enough user requesting it, and doesn't need to be a core part
>> of
>> > > the FLIP.
>> > >
>> > > Besides, unfortunately Postgres doesn't allow changing databases via
>> > JDBC.
>> > >
>> > > JDBC provides different connecting options as you mentioned, but I'd
>> like
>> > > to keep our design and API simple and having to handle extra parsing
>> > logic.
>> > > And it doesn't shut the door for what you proposed as a future effort.
>> > >
>> > > > Since the PostgreSQL does not have catalog but schema under
>> database,
>> > > why not mapping the PG-database to Flink catalog and PG-schema to
>> Flink
>> > > database
>> > >
>> > > Danny, because 1) there are frequent use cases where users want to
>> switch
>> > > databases or referencing objects across databases in a pg instance 2)
>> > > schema is an optional namespace layer in pg, it always has a default
>> > value
>> > > ("public") and can be invisible to users if they'd like to as shown in
>> > the
>> > > FLIP 3) as you mentioned it is specific to postgres, and I don't feel
>> > it's
>> > > necessary to map Postgres substantially different than others DBMSs
>> with
>> > > additional complexity
>> > >
>> > > >'base_url' configuration: We are following the configuration format
>> > > guideline [1] which suggest to use dash (-) instead of underline (_).
>> And
>> > > I'm a little confused the

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-21 Thread Bowen Li

Congrats, Jingsong!

On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann  wrote:

> Congratulations Jingsong!
>
> Cheers,
> Till
>
> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao  wrote:
>
>>   Congratulations Jingsong!
>>
>>Best,
>>Yun
>>
>> --
>> From:Jingsong Li 
>> Send Time:2020 Feb. 21 (Fri.) 21:42
>> To:Hequn Cheng 
>> Cc:Yang Wang ; Zhijiang <
>> wangzhijiang...@aliyun.com>; Zhenghua Gao ; godfrey he
>> ; dev ; user <
>> u...@flink.apache.org>
>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>>
>> Thanks everyone~
>>
>> It's my pleasure to be part of the community. I hope I can make a better
>> contribution in future.
>>
>> Best,
>> Jingsong Lee
>>
>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng  wrote:
>> Congratulations Jingsong! Well deserved.
>>
>> Best,
>> Hequn
>>
>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang  wrote:
>> Congratulations！Jingsong. Well deserved.
>>
>>
>> Best,
>> Yang
>>
>> Zhijiang  于2020年2月21日周五 下午1:18写道：
>> Congrats Jingsong! Welcome on board!
>>
>> Best,
>> Zhijiang
>>
>> --
>> From:Zhenghua Gao 
>> Send Time:2020 Feb. 21 (Fri.) 12:49
>> To:godfrey he 
>> Cc:dev ; user 
>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>>
>> Congrats Jingsong！
>>
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he  wrote:
>> Congrats Jingsong! Well deserved.
>>
>> Best,
>> godfrey
>>
>> Jeff Zhang  于2020年2月21日周五 上午11:49写道：
>> Congratulations！Jingsong. You deserve it
>>
>> wenlong.lwl  于2020年2月21日周五 上午11:43写道：
>> Congrats Jingsong!
>>
>> On Fri, 21 Feb 2020 at 11:41, Dian Fu  wrote:
>>
>> > Congrats Jingsong!
>> >
>> > > 在 2020年2月21日，上午11:39，Jark Wu  写道：
>> > >
>> > > Congratulations Jingsong! Well deserved.
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
>> > >
>> > >> Congratulations! Jingsong
>> > >>
>> > >>
>> > >> Best,
>> > >> Dan Zou
>> > >>
>> >
>> >
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>>
>>

[VOTE] FLIP-93: JDBC catalog and Postgres catalog

2020-02-27 Thread Bowen Li

Hi all,

I'd like to kick off the vote for FLIP-93 [1] to add JDBC catalog and
Postgres catalog.

The vote will last for at least 72 hours, following the consensus voting
protocol.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog

Discussion thread:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-92-JDBC-catalog-and-Postgres-catalog-td36505.html

Re: Creating TemporalTable based on Catalog table in SQL Client

2020-03-03 Thread Bowen Li

Hi Gyula,

What line 622 (the link you shared) does is not registering catalogs, but
setting an already registered catalog as the current one. As you can see
from the method and its comment, catalogs are loaded first before any
tables in yaml are registered, so you should be able to achieve what you
described.

Bowen

On Tue, Mar 3, 2020 at 5:16 AM Gyula Fóra  wrote:

> Hi all!
>
> I was testing the TemporalTable functionality in the SQL client while using
> the Hive Catalog and I ran into the following problem.
>
> I have a table created in the Hive catalog and I want to create a temporal
> table over it.
>
> As we cannot create temporal tables in SQL directly I have to define it in
> the environment yaml file. Unfortunately it seems to be impossible to
> reference a table only present in the catalog (not in the yaml) as catalogs
> are loaded only after creating the temporal table (see
>
> https://github.com/apache/flink/blob/master/flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/local/ExecutionContext.java#L622
> )
>
> I am wondering if it would make sense to set the catalogs before all else
> or if that would cause some other problems.
>
> What do you think?
> Gyula
>

Re: Creating TemporalTable based on Catalog table in SQL Client

you would need to reference the table with fully qualified name with
catalog and database

On Wed, Mar 4, 2020 at 02:17 Gyula Fóra  wrote:

> I guess it will only work now if you specify the catalog name too when
> referencing the table.
>
>
> On Wed, Mar 4, 2020 at 11:15 AM Gyula Fóra  wrote:
>
> > You are right but still if the default catalog is something else and
> > that's the one containing the table then it still wont work currently.
> >
> > Gyula
> >
> > On Wed, Mar 4, 2020 at 5:08 AM Bowen Li  wrote:
> >
> >> Hi Gyula,
> >>
> >> What line 622 (the link you shared) does is not registering catalogs,
> but
> >> setting an already registered catalog as the current one. As you can see
> >> from the method and its comment, catalogs are loaded first before any
> >> tables in yaml are registered, so you should be able to achieve what you
> >> described.
> >>
> >> Bowen
> >>
> >> On Tue, Mar 3, 2020 at 5:16 AM Gyula Fóra  wrote:
> >>
> >> > Hi all!
> >> >
> >> > I was testing the TemporalTable functionality in the SQL client while
> >> using
> >> > the Hive Catalog and I ran into the following problem.
> >> >
> >> > I have a table created in the Hive catalog and I want to create a
> >> temporal
> >> > table over it.
> >> >
> >> > As we cannot create temporal tables in SQL directly I have to define
> it
> >> in
> >> > the environment yaml file. Unfortunately it seems to be impossible to
> >> > reference a table only present in the catalog (not in the yaml) as
> >> catalogs
> >> > are loaded only after creating the temporal table (see
> >> >
> >> >
> >>
> https://github.com/apache/flink/blob/master/flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/local/ExecutionContext.java#L622
> >> > )
> >> >
> >> > I am wondering if it would make sense to set the catalogs before all
> >> else
> >> > or if that would cause some other problems.
> >> >
> >> > What do you think?
> >> > Gyula
> >> >
> >>
> >
>

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

Thanks, Jingsong, for bringing this up. We've received lots of feedbacks in
the past few months that the complexity involved in different Hive versions
has been quite painful for users to start with. So it's great to step
forward and deal with such issue.

Before getting on a decision, can you please explain:

1) why you proposed segregating hive versions into the 5 ranges above?
2) what different Hive features are supported in the 5 ranges?
3) have you tested that whether the proposed corresponding Flink module
will be fully compatible with each Hive version range?

Thanks,
Bowen



On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee  wrote:

> Hi all,
>
> I'd like to propose introduce flink-connector-hive-xx modules.
>
> We have documented the dependencies detailed information[2]. But still has
> some inconvenient:
> - Too many versions, users need to pick one version from 8 versions.
> - Too many versions, It's not friendly to our developers either, because
> there's a problem/exception, we need to look at eight different versions of
> hive client code, which are often various.
> - Too many jars, for example, users need to download 4+ jars for Hive 1.x
> from various places.
>
> We have discussed in [1] and [2], but unfortunately, we can not achieve an
> agreement.
>
> For improving this, I'd like to introduce few flink-connector-hive-xx
> modules in flink-connectors, module contains all the dependencies related
> to hive. And only support lower hive metastore versions:
> - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
>
> Users can choose one and download to flink/lib. It includes all hive
> things.
>
> I try to use a single module to deploy multiple versions, but I can not
> find a suitable way, because different modules require different versions
> and different dependencies.
>
> What do you think?
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> [2]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
>
> Best,
> Jingsong Lee
>

Re: [VOTE] FLIP-93: JDBC catalog and Postgres catalog

I'm glad to announce that the voting of FLIP-93 has passed, with 7 +1  (3
binding: Jingsong, Kurt, Jark, 4 non-binding: Benchao, zoudan, Terry,
Leonard) and no -1.

Thanks everyone for participating!

Cheers,
Bowen

On Mon, Mar 2, 2020 at 7:33 AM Leonard Xu  wrote:

> +1 (non-binding).
>
>  Very useful feature especially for ETL, It will make  connecting to
> existed DB systems easier.
>
> Best,
> Leonard
>
> > 在 2020年3月2日，21:58，Jark Wu  写道：
> >
> > +1 from my side.
> >
> > Best,
> > Jark
> >
> > On Mon, 2 Mar 2020 at 21:40, Kurt Young  wrote:
> >
> >> +1
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Mon, Mar 2, 2020 at 5:32 PM Jingsong Lee 
> >> wrote:
> >>
> >>> +1 from my side.
> >>>
> >>> Best,
> >>> Jingsong Lee
> >>>
> >>> On Mon, Mar 2, 2020 at 11:06 AM Terry Wang  wrote:
> >>>
> >>>> +1 (non-binding).
> >>>> With this feature, we can more easily interact traditional database in
> >>>> flink.
> >>>>
> >>>> Best,
> >>>> Terry Wang
> >>>>
> >>>>
> >>>>
> >>>>> 2020年3月1日 18:33，zoudan  写道：
> >>>>>
> >>>>> +1 (non-binding)
> >>>>>
> >>>>> Best,
> >>>>> Dan Zou
> >>>>>
> >>>>>
> >>>>>> 在 2020年2月28日，02:38，Bowen Li  写道：
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'd like to kick off the vote for FLIP-93 [1] to add JDBC catalog
> >> and
> >>>>>> Postgres catalog.
> >>>>>>
> >>>>>> The vote will last for at least 72 hours, following the consensus
> >>> voting
> >>>>>> protocol.
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog
> >>>>>>
> >>>>>> Discussion thread:
> >>>>>>
> >>>>
> >>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-92-JDBC-catalog-and-Postgres-catalog-td36505.html
> >>>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Best, Jingsong Lee
> >>>
> >>
>
>

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

Thanks Jingsong for your explanation! I'm +1 for this initiative.

According to your description, I think it makes sense to incorporate
support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges to
4.

A couple minor followup questions:
1) will there be a base module like "flink-connector-hive-base" which holds
all the common logic of these proposed modules and is compiled into the
uber jar of "flink-connector-hive-xxx"?
2) according to my observation, it's more common to set the version in
module name to be the lowest version that this module supports, e.g. for
Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
rather than "flink-connector-hive-1.2"


On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li  wrote:

> Thanks Bowen for involving.
>
> > why you proposed segregating hive versions into the 5 ranges above? &
> what different Hive features are supported in the 5 ranges?
>
> For only higher client dependencies version support lower hive metastore
> versions:
> - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats, we
> can throw exception for the unsupported feature.
> - Hive 2.0 and Hive 2.1, primary key support and alter_partition api
> change.
> - Hive 2.2 no thrift change.
> - Hive 2.3 change many things, lots of thrift change.
> - Hive 3+, not null. unique, timestamp, so many things.
>
> All these things can be found in hive_metastore.thrift.
>
> I think I can try do more effort in implementation to use Hive 2.2 to
> support Hive 2.0. So the range size will be 4.
>
> > have you tested that whether the proposed corresponding Flink module will
> be fully compatible with each Hive version range?
>
> Yes, I have done some tests, not really for "fully", but it is a technical
> judgment.
>
> Best,
> Jingsong Lee
>
> On Thu, Mar 5, 2020 at 1:17 PM Bowen Li  wrote:
>
> > Thanks, Jingsong, for bringing this up. We've received lots of feedbacks
> in
> > the past few months that the complexity involved in different Hive
> versions
> > has been quite painful for users to start with. So it's great to step
> > forward and deal with such issue.
> >
> > Before getting on a decision, can you please explain:
> >
> > 1) why you proposed segregating hive versions into the 5 ranges above?
> > 2) what different Hive features are supported in the 5 ranges?
> > 3) have you tested that whether the proposed corresponding Flink module
> > will be fully compatible with each Hive version range?
> >
> > Thanks,
> > Bowen
> >
> >
> >
> > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee 
> > wrote:
> >
> > > Hi all,
> > >
> > > I'd like to propose introduce flink-connector-hive-xx modules.
> > >
> > > We have documented the dependencies detailed information[2]. But still
> > has
> > > some inconvenient:
> > > - Too many versions, users need to pick one version from 8 versions.
> > > - Too many versions, It's not friendly to our developers either,
> because
> > > there's a problem/exception, we need to look at eight different
> versions
> > of
> > > hive client code, which are often various.
> > > - Too many jars, for example, users need to download 4+ jars for Hive
> 1.x
> > > from various places.
> > >
> > > We have discussed in [1] and [2], but unfortunately, we can not achieve
> > an
> > > agreement.
> > >
> > > For improving this, I'd like to introduce few flink-connector-hive-xx
> > > modules in flink-connectors, module contains all the dependencies
> related
> > > to hive. And only support lower hive metastore versions:
> > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
> > >
> > > Users can choose one and download to flink/lib. It includes all hive
> > > things.
> > >
> > > I try to use a single module to deploy multiple versions, but I can not
> > > find a suitable way, because different modules require different
> versions
> > > and different dependencies.
> > >
> > > What do you think?
> > >
> > > [1]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> > > [2]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
> > >
> > > Best,
> > > Jingsong Lee
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

2020-03-05 Thread Bowen Li

> I have some hesitation, because the actual version number can better
reflect the actual dependency. For example, if the user also knows the
field hiveVersion[1]. He may enter the wrong hiveVersion because of the
name, or he may have the wrong expectation for the hive built-in functions.

Sorry, I'm not sure if my proposal is understood correctly.

What I'm saying is, in your original proposal, taking an example, suggested
naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 -
1.2.2, a name including the highest Hive version it supports. I'm
suggesting to name it "flink-connector-hive-1.0", a name including the
lowest Hive version it supports.

What do you think?



On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li  wrote:

> Hi Bowen, thanks for your reply.
>
> > will there be a base module like "flink-connector-hive-base" which holds
> all the common logic of these proposed modules
>
> Maybe we don't need, their implementation is only "pom.xml". Different
> versions have different dependencies.
>
> > it's more common to set the version in module name to be the lowest
> version that this module supports
>
> I have some hesitation, because the actual version number can better
> reflect the actual dependency. For example, if the user also knows the
> field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> name, or he may have the wrong expectation for the hive built-in functions.
>
> [1] https://github.com/apache/flink/pull/11304
>
> Best,
> Jingsong Lee
>
> On Thu, Mar 5, 2020 at 2:34 PM Bowen Li  wrote:
>
> > Thanks Jingsong for your explanation! I'm +1 for this initiative.
> >
> > According to your description, I think it makes sense to incorporate
> > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges
> to
> > 4.
> >
> > A couple minor followup questions:
> > 1) will there be a base module like "flink-connector-hive-base" which
> holds
> > all the common logic of these proposed modules and is compiled into the
> > uber jar of "flink-connector-hive-xxx"?
> > 2) according to my observation, it's more common to set the version in
> > module name to be the lowest version that this module supports, e.g. for
> > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> > rather than "flink-connector-hive-1.2"
> >
> >
> > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li 
> > wrote:
> >
> > > Thanks Bowen for involving.
> > >
> > > > why you proposed segregating hive versions into the 5 ranges above? &
> > > what different Hive features are supported in the 5 ranges?
> > >
> > > For only higher client dependencies version support lower hive
> metastore
> > > versions:
> > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats,
> > we
> > > can throw exception for the unsupported feature.
> > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api
> > > change.
> > > - Hive 2.2 no thrift change.
> > > - Hive 2.3 change many things, lots of thrift change.
> > > - Hive 3+, not null. unique, timestamp, so many things.
> > >
> > > All these things can be found in hive_metastore.thrift.
> > >
> > > I think I can try do more effort in implementation to use Hive 2.2 to
> > > support Hive 2.0. So the range size will be 4.
> > >
> > > > have you tested that whether the proposed corresponding Flink module
> > will
> > > be fully compatible with each Hive version range?
> > >
> > > Yes, I have done some tests, not really for "fully", but it is a
> > technical
> > > judgment.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li  wrote:
> > >
> > > > Thanks, Jingsong, for bringing this up. We've received lots of
> > feedbacks
> > > in
> > > > the past few months that the complexity involved in different Hive
> > > versions
> > > > has been quite painful for users to start with. So it's great to step
> > > > forward and deal with such issue.
> > > >
> > > > Before getting on a decision, can you please explain:
> > > >
> > > > 1) why you proposed segregating hive versions into the 5 ranges
> above?
> > > > 2) what different Hive features are supported in the 5 ranges?
> > > > 3) have you tested that whether the proposed

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

2020-03-05 Thread Bowen Li

Hi Jingsong,

I think I misunderstood you. So your argument is that, to support hive
1.0.0 - 1.2.2, we are actually using Hive 1.2.2 and thus we name the flink
module as "flink-connector-hive-1.2", right? It makes sense to me now.

+1 for this change.

Cheers,
Bowen

On Thu, Mar 5, 2020 at 6:53 PM Jingsong Li  wrote:

> Hi Bowen,
>
> My idea is to directly provide the really dependent version, such as hive
> 1.2.2, our jar name is hive 1.2.2, so that users can directly and clearly
> know the version. As for which metastore is supported, we can guide it in
> the document, otherwise, write 1.0, and the result version is indeed 1.2.2,
> which will make users have wrong expectations.
>
> Another, maybe 2.3.6 can support 2.0-2.2 after some efforts.
>
> Best,
> Jingsong Lee
>
> On Fri, Mar 6, 2020 at 1:00 AM Bowen Li  wrote:
>
> > > I have some hesitation, because the actual version number can better
> > reflect the actual dependency. For example, if the user also knows the
> > field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> > name, or he may have the wrong expectation for the hive built-in
> functions.
> >
> > Sorry, I'm not sure if my proposal is understood correctly.
> >
> > What I'm saying is, in your original proposal, taking an example,
> suggested
> > naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 -
> > 1.2.2, a name including the highest Hive version it supports. I'm
> > suggesting to name it "flink-connector-hive-1.0", a name including the
> > lowest Hive version it supports.
> >
> > What do you think?
> >
> >
> >
> > On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li 
> > wrote:
> >
> > > Hi Bowen, thanks for your reply.
> > >
> > > > will there be a base module like "flink-connector-hive-base" which
> > holds
> > > all the common logic of these proposed modules
> > >
> > > Maybe we don't need, their implementation is only "pom.xml". Different
> > > versions have different dependencies.
> > >
> > > > it's more common to set the version in module name to be the lowest
> > > version that this module supports
> > >
> > > I have some hesitation, because the actual version number can better
> > > reflect the actual dependency. For example, if the user also knows the
> > > field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> > > name, or he may have the wrong expectation for the hive built-in
> > functions.
> > >
> > > [1] https://github.com/apache/flink/pull/11304
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Mar 5, 2020 at 2:34 PM Bowen Li  wrote:
> > >
> > > > Thanks Jingsong for your explanation! I'm +1 for this initiative.
> > > >
> > > > According to your description, I think it makes sense to incorporate
> > > > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of
> > ranges
> > > to
> > > > 4.
> > > >
> > > > A couple minor followup questions:
> > > > 1) will there be a base module like "flink-connector-hive-base" which
> > > holds
> > > > all the common logic of these proposed modules and is compiled into
> the
> > > > uber jar of "flink-connector-hive-xxx"?
> > > > 2) according to my observation, it's more common to set the version
> in
> > > > module name to be the lowest version that this module supports, e.g.
> > for
> > > > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> > > > rather than "flink-connector-hive-1.2"
> > > >
> > > >
> > > > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li 
> > > > wrote:
> > > >
> > > > > Thanks Bowen for involving.
> > > > >
> > > > > > why you proposed segregating hive versions into the 5 ranges
> > above? &
> > > > > what different Hive features are supported in the 5 ranges?
> > > > >
> > > > > For only higher client dependencies version support lower hive
> > > metastore
> > > > > versions:
> > > > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column
> > stats,
> > > > we
> > > > > can throw exception for the unsupported feature.
> > > > > - Hive 2.0 and Hive 2.1, prima

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-03-10 Thread Bowen Li

Thanks Danny for kicking off the effort

The root cause of too much manual work is Flink DDL has mixed 3 types of
params together and doesn't handle each of them very well. Below are how I
categorize them and corresponding solutions in my mind:

- type 1: Metadata of external data, like external endpoint/url,
username/pwd, schemas, formats.

Such metadata are mostly already accessible in external system as long as
endpoints and credentials are provided. Flink can get it thru catalogs, but
we haven't had many catalogs yet and thus Flink just hasn't been able to
leverage that. So the solution should be building more catalogs. Such
params should be part of a Flink table DDL/definition, and not overridable
in any means.

- type 2: Runtime params, like jdbc connector's fetch size, elasticsearch
connector's bulk flush size.

Such params don't affect query results, but affect how results are produced
(eg. fast or slow, aka performance) - they are essentially execution and
implementation details. They change often in exploration or development
stages, but not quite frequently in well-defined long-running pipelines.
They should always have default values and can be missing in query. They
can be part of a table DDL/definition, but should also be replaceable in a
query - *this is what table "hints" in FLIP-113 should cover*.

- type 3: Semantic params, like kafka connector's start offset.

Such params affect query results - the semantics. They'd better be as
filter conditions in WHERE clause that can be pushed down. They change
almost every time a query starts and have nothing to do with metadata, thus
should not be part of table definition/DDL, nor be persisted in catalogs.
If they will, users should create views to keep such params around (note
this is different from variable substitution).

Take Flink-Kafka as an example. Once we get these params right, here're the
steps users need to do to develop and run a Flink job:
- configure a Flink ConfluentSchemaRegistry with url, username, and password
- run "SELECT * FROM mykafka WHERE offset > 12pm yesterday" (simplified
timestamp) in SQL CLI, Flink automatically retrieves all metadata of
schema, file format, etc and start the job
- users want to make the job read Kafka topic faster, so it goes as "SELECT
* FROM mykafka /* faster_read_key=value*/ WHERE offset > 12pm yesterday"
- done and satisfied, users submit it to production

Regarding "CREATE TABLE t LIKE with (k1=v1, k2=v2), I think it's a
nice-to-have feature, but not a strategically critical, long-term solution,
because
1) It may seem promising at the current stage to solve the
too-much-manual-work problem, but that's only because Flink hasn't
leveraged catalogs well and handled the 3 types of params above properly.
Once we get the params types right, the LIKE syntax won't be that
important, and will be just an easier way to create tables without retyping
long fields like username and pwd.
2) Note that only some rare type of catalog can store k-v property pair, so
table created this way often cannot be persisted. In the foreseeable
future, such catalog will only be HiveCatalog, and not everyone has a Hive
metastore. To be honest, without persistence, recreating tables every time
this way is still a lot of keyboard typing.

Cheers,
Bowen

On Tue, Mar 10, 2020 at 8:07 PM Kurt Young  wrote:

> If a specific connector want to have such parameter and read if out of
> configuration, then that's fine.
> If we are talking about a configuration for all kinds of sources, I would
> be super careful about that.
> It's true it can solve maybe 80% cases, but it will also make the left 20%
> feels weird.
>
> Best,
> Kurt
>
>
> On Wed, Mar 11, 2020 at 11:00 AM Jark Wu  wrote:
>
> > Hi Kurt,
> >
> > #3 Regarding to global offset:
> > I'm not saying to use the global configuration to override connector
> > properties by the planner.
> > But the connector should take this configuration and translate into their
> > client API.
> > AFAIK, almost all the message queues support eariliest and latest and a
> > timestamp value as start point.
> > So we can support 3 options for this configuration: "eariliest", "latest"
> > and a timestamp string value.
> > Of course, this can't solve 100% cases, but I guess can sovle 80% or 90%
> > cases.
> > And the remaining cases can be resolved by LIKE syntax which I guess is
> not
> > very common cases.
> >
> > Best,
> > Jark
> >
> >
> > On Wed, 11 Mar 2020 at 10:33, Kurt Young  wrote:
> >
> > > Good to have such lovely discussions. I also want to share some of my
> > > opinions.
> > >
> > > #1 Regarding to error handling: I also think ignore invalid hints would
> > be
> > > dangerous, maybe
> > > the simplest solution is just throw an exception.
> > >
> > > #2 Regarding to property replacement: I don't think we should
> constraint
> > > ourself to
> > > the meaning of the word "hint", and forbidden it modifying any
> properties
> > > which can effect
> > > query results. IMO `PROPERTIES` is one

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-03-11 Thread Bowen Li

it is true that our DDL is not standard compliant by using the WITH
> >> clause. Nevertheless, we aim for not diverging too much and the LIKE
> >> clause is an example of that. It will solve things like overwriting
> >> WATERMARKs, add additional/modifying properties and inherit schema.
> >>
> >> Bowen is right that Flink's DDL is mixing 3 types definition together.
> >> We are not the first ones that try to solve this. There is also the SQL
> >> MED standard [1] that tried to tackle this problem. I think it was not
> >> considered when designing the current DDL.
> >>
> >> Currently, I see 3 options for handling Kafka offsets. I will give some
> >> examples and look forward to feedback here:
> >>
> >> *Option 1* Runtime and semantic parms as part of the query
> >>
> >> `SELECT * FROM MyTable('offset'=123)`
> >>
> >> Pros:
> >> - Easy to add
> >> - Parameters are part of the main query
> >> - No complicated hinting syntax
> >>
> >> Cons:
> >> - Not SQL compliant
> >>
> >> *Option 2* Use metadata in query
> >>
> >> `CREATE TABLE MyTable (id INT, offset AS SYSTEM_METADATA('offset'))`
> >>
> >> `SELECT * FROM MyTable WHERE offset > TIMESTAMP '2012-12-12 12:34:22'`
> >>
> >> Pros:
> >> - SQL compliant in the query
> >> - Access of metadata in the DDL which is required anyway
> >> - Regular pushdown rules apply
> >>
> >> Cons:
> >> - Users need to add an additional comlumn in the DDL
> >>
> >> *Option 3*: Use hints for properties
> >>
> >> `
> >> SELECT *
> >> FROM MyTable /*+ PROPERTIES('offset'=123) */
> >> `
> >>
> >> Pros:
> >> - Easy to add
> >>
> >> Cons:
> >> - Parameters are not part of the main query
> >> - Cryptic syntax for new users
> >> - Not standard compliant.
> >>
> >> If we go with this option, I would suggest to make it available in a
> >> separate map and don't mix it with statically defined properties. Such
> >> that the factory can decide which properties have the right to be
> >> overwritten by the hints:
> >> TableSourceFactory.Context.getQueryHints(): ReadableConfig
> >>
> >> Regards,
> >> Timo
> >>
> >> [1] https://en.wikipedia.org/wiki/SQL/MED
> >>
> >> Currently I see 3 options as a
> >>
> >>
> >> On 11.03.20 07:21, Danny Chan wrote:
> >>> Thanks Bowen ~
> >>>
> >>> I agree we should somehow categorize our connector parameters.
> >>>
> >>> For type1, I’m already preparing a solution like the Confluent schema
> registry + Avro schema inference thing, so this may not be a problem in the
> near future.
> >>>
> >>> For type3, I have some questions:
> >>>
> >>>> "SELECT * FROM mykafka WHERE offset > 12pm yesterday”
> >>>
> >>> Where does the offset column come from, a virtual column from the
> table schema, you said that
> >>>
> >>>> They change
> >>> almost every time a query starts and have nothing to do with metadata,
> thus
> >>> should not be part of table definition/DDL
> >>>
> >>> But why you can reference it in the query, I’m confused for that, can
> you elaborate a little ?
> >>>
> >>> Best,
> >>> Danny Chan
> >>> 在 2020年3月11日 +0800 PM12:52，Bowen Li ，写道：
> >>>> Thanks Danny for kicking off the effort
> >>>>
> >>>> The root cause of too much manual work is Flink DDL has mixed 3 types
> of
> >>>> params together and doesn't handle each of them very well. Below are
> how I
> >>>> categorize them and corresponding solutions in my mind:
> >>>>
> >>>> - type 1: Metadata of external data, like external endpoint/url,
> >>>> username/pwd, schemas, formats.
> >>>>
> >>>> Such metadata are mostly already accessible in external system as
> long as
> >>>> endpoints and credentials are provided. Flink can get it thru
> catalogs, but
> >>>> we haven't had many catalogs yet and thus Flink just hasn't been able
> to
> >>>> leverage that. So the solution should be building more catalogs. Such
> >>>> params should be part of a Flink table

Re: FLIP-117: HBase catalog

2020-03-16 Thread Bowen Li

Hi,

I think core of the jira right now is to investigate if catalogs of
schemaless systems like HBase and Elasticsearch bring practical value to
users. I haven't used these SQL connectors before, and thus don't have much
to say in this case. Can anyone describe how it would work? Maybe @Yu
or @Zheng can chime in.

w.r.t unsupported operation exception, they should be thrown in targeted
getters (e.g. getView(), getFunction()). General listing APIs like
listView(), listFunction() should not throw them but just return empty
results, for the sake of not breaking user SQL experience. To dedup code,
such common implementations can be moved to AbstractCatalog to make APIs
look cleaner. I recall that there was an intention to refactor catalog API
signatures, but haven't kept up with it.

Bowen

On Sun, Mar 15, 2020 at 10:19 PM Jingsong Li  wrote:

> Thanks Flavio for driving. Personally I am +1 for integrating HBase tables.
>
> I start a new topic for discussion. It is related but not the core of this
> FLIP.
> In the FLIP, I can see:
> - Does HBase support the concept of partitions..? I don't think so..
> - Does HBase support functions? I don't think so..
> - Does HBase support statistics? I don't think so..
> - Does HBase support views? I don't think so..
>
> And in JDBC catalog [1]. There are lots of UnsupportedOperationExceptions
> too.
> And maybe for confluent catalog, UnsupportedOperationExceptions come again.
> Lots of UnsupportedOperationExceptions looks unhappy to this catalog api...
> So can we do some refactor to catalog api? I can see a lot of catalogs
> just need provide table information without partitions, functions,
> statistics, views...
>
> CC: @Dawid Wysakowicz  @Bowen Li
> 
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog
>
> Best,
> Jingsong Lee
>
> On Sat, Mar 14, 2020 at 7:36 AM Flavio Pompermaier 
> wrote:
>
>> Hello everybody,
>> I started a new FLIP to discuss about an HBaseCatalog implementation[1]
>> after the opening of the relative issue by Bowen [2].
>> I drafted a very simple version of the FLIP just to discuss about the
>> critical points (in red) in order to decide how to proceed.
>>
>> Best,
>> Flavio
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-117%3A+HBase+catalog
>> [2] https://issues.apache.org/jira/browse/FLINK-16575
>>
>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-20 Thread Bowen Li

+1.

I would suggest to take a step even further and see what users really need
to test/try/play with table API and Flink SQL. Besides this one, here're
some more sources and sinks that I have developed or used previously to
facilitate building Flink table/SQL pipelines.


   1. random input data source
  - should generate random data at a specified rate according to schema
  - purposes
 - test Flink pipeline and data can end up in external storage
 correctly
 - stress test Flink sink as well as tuning up external storage
  2. print data sink
  - should print data in row format in console
  - purposes
 - make it easier to test Flink SQL job e2e in IDE
 - test Flink pipeline and ensure output data format/value is
 correct
  3. no output data sink
  - just swallow output data without doing anything
  - purpose
 - evaluate and tune performance of Flink source and the whole
 pipeline. Users' don't need to worry about sink back pressure

These may be taken into consideration all together as an effort to lower
the threshold of running Flink SQL/table API, and facilitate users' daily
work.

Cheers,
Bowen


On Thu, Mar 19, 2020 at 10:32 PM Jingsong Li  wrote:

> Hi all,
>
> I heard some users complain that table is difficult to test. Now with SQL
> client, users are more and more inclined to use it to test rather than
> program.
> The most common example is Kafka source. If users need to test their SQL
> output and checkpoint, they need to:
>
> - 1.Launch a Kafka standalone, create a Kafka topic .
> - 2.Write a program, mock input records, and produce records to Kafka
> topic.
> - 3.Then test in Flink.
>
> The step 1 and 2 are annoying, although this test is E2E.
>
> Then I found StatefulSequenceSource, it is very good because it has deal
> with checkpoint things, so it is very good to checkpoint mechanism.Usually,
> users are turned on checkpoint in production.
>
> With computed columns, user are easy to create a sequence source DDL same
> to Kafka DDL. Then they can test inside Flink, don't need launch other
> things.
>
> Have you consider this? What do you think?
>
> CC: @Aljoscha Krettek  the author
> of StatefulSequenceSource.
>
> Best,
> Jingsong Lee
>

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-23 Thread Bowen Li

gt; > > 3.blackhole sink
> > > > - very useful for high performance testing of Flink
> > > > - I've also run into users trying UDF to output, not sink, so they
> need
> > > > this sink as well.
> > > >
> > > > DDL:
> > > > CREATE TABLE blackhole_table (
> > > > ...
> > > > ) WITH (
> > > > 'connector.type' = 'blackhole'
> > > > )
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Mon, Mar 23, 2020 at 12:04 PM Dian Fu 
> > wrote:
> > > >
> > > > > Thanks Jingsong for bringing up this discussion. +1 to this
> > proposal. I
> > > > > think Bowen's proposal makes much sense to me.
> > > > >
> > > > > This is also a painful problem for PyFlink users. Currently there
> is
> > no
> > > > > built-in easy-to-use table source/sink and it requires users to
> > write a
> > > > lot
> > > > > of code to trying out PyFlink. This is especially painful for new
> > users
> > > > who
> > > > > are not familiar with PyFlink/Flink. I have also encountered the
> > > tedious
> > > > > process Bowen encountered, e.g. writing random source connector,
> > print
> > > > sink
> > > > > and also blackhole print sink as there are no built-in ones to use.
> > > > >
> > > > > Regards,
> > > > > Dian
> > > > >
> > > > > > 在 2020年3月22日，上午11:24，Jark Wu  写道：
> > > > > >
> > > > > > +1 to Bowen's proposal. I also saw many requirements on such
> > built-in
> > > > > > connectors.
> > > > > >
> > > > > > I will leave some my thoughts here:
> > > > > >
> > > > > >> 1. datagen source (random source)
> > > > > > I think we can merge the functinality of sequence-source into
> > random
> > > > > source
> > > > > > to allow users to custom their data values.
> > > > > > Flink can generate random data according to the field types,
> users
> > > > > > can customize their values to be more domain specific, e.g.
> > > > > > 'field.user'='User_[1-9]{0,1}'
> > > > > > This will be similar to kafka-datagen-connect[1].
> > > > > >
> > > > > >> 2. console sink (print sink)
> > > > > > This will be very useful in production debugging, to easily
> output
> > an
> > > > > > intermediate view or result view to a `.out` file.
> > > > > > So that we can look into the data representation, or check dirty
> > > data.
> > > > > > This should be out-of-box without manually DDL registration.
> > > > > >
> > > > > >> 3. blackhole sink (no output sink)
> > > > > > This is very useful for high performance testing of Flink, to
> > > meansure
> > > > > the
> > > > > > throughput of the whole pipeline without sink.
> > > > > > Presto also provides this as a built-in connector [2].
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/confluentinc/kafka-connect-datagen#define-a-new-schema-specification
> > > > > > [2]: https://prestodb.io/docs/current/connector/blackhole.html
> > > > > >
> > > > > >
> > > > > > On Sat, 21 Mar 2020 at 12:31, Bowen Li 
> > wrote:
> > > > > >
> > > > > >> +1.
> > > > > >>
> > > > > >> I would suggest to take a step even further and see what users
> > > really
> > > > > need
> > > > > >> to test/try/play with table API and Flink SQL. Besides this one,
> > > > here're
> > > > > >> some more sources and sinks that I have developed or used
> > previously
> > > > to
> > > > > >> facilitate building Flink table/SQL pipelines.
> > > > > >>
> > > > > >>
> > > > > >>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-19 Thread Bowen Li

> > >> >> >> built-in
> > > >> >> >>>>>> function or global temp function. (In absence of the
> special
> > > >> >> >>> namespace,
> > > >> >> >>>>> the
> > > >> >> >>>>>> resolution order is the same as in #2.)
> > > >> >> >>>>>>
> > > >> >> >>>>>> My personal preference is #1, given the unknown use case
> and
> > > >> >> >>> introduced
> > > >> >> >>>>>> complexity for #2 and #3. However, #2 is an acceptable
> > > >> alternative.
> > > >> >> >>>> Thus,
> > > >> >> >>>>>> my votes are:
> > > >> >> >>>>>>
> > > >> >> >>>>>> +1 for #1
> > > >> >> >>>>>> +0 for #2
> > > >> >> >>>>>> -1 for #3
> > > >> >> >>>>>>
> > > >> >> >>>>>> Everyone, please cast your vote (in above format please!),
> > or
> > > >> let
> > > >> >> >> me
> > > >> >> >>>> know
> > > >> >> >>>>>> if you have more questions or other candidates.
> > > >> >> >>>>>>
> > > >> >> >>>>>> Thanks,
> > > >> >> >>>>>> Xuefu
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>>
> > > >> >> >>>>>> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek <
> > > >> >> >>> aljos...@apache.org>
> > > >> >> >>>>>> wrote:
> > > >> >> >>>>>>
> > > >> >> >>>>>>> Hi,
> > > >> >> >>>>>>>
> > > >> >> >>>>>>> I think this discussion and the one for FLIP-64 are very
> > > >> >> >> connected.
> > > >> >> >>>> To
> > > >> >> >>>>>>> resolve the differences, think we have to think about the
> > > basic
> > > >> >> >>>>>> principles
> > > >> >> >>>>>>> and find consensus there. The basic questions I see are:
> > > >> >> >>>>>>>
> > > >> >> >>>&g

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-19 Thread Bowen Li

Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop the
temporary built-in function in the same session? With the former one, they
can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With the latter
one, I'm not sure how users can "restore" the original builtin function
easily from an "altered" function without introducing further nonstandard
SQL syntax.

Also please pardon me as I realized using net may not be a good idea... I'm
trying to fit this vote into cases listed in Flink Bylaw [1].

>From the following result, the majority seems to be #2 too as it has the
most approval so far and doesn't have strong "-1".

#1：3 (+1), 1 (0), 4(-1)
#2：4(0), 3 (+1), 1(+0.5)
   * Dawid -1/0 depending on keyword
#3：2(+1), 3(-1), 3(0)

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026

On Thu, Sep 19, 2019 at 10:30 AM Bowen Li  wrote:

> Hi,
>
> Thanks everyone for your votes. I summarized the result as following:
>
> #1：3 (+1), 1 (0), 4(-1) - net: -1
> #2：4(0), 2 (+1), 1(+0.5)  - net: +2.5
> Dawid -1/0 depending on keyword
> #3：2(+1), 3(-1), 3(0)   - net: -1
>
> Given the result, I'd like to change my vote for #2 from 0 to +1, to make
> it a stronger case with net +3.5. So the votes so far are:
>
> #1：3 (+1), 1 (0), 4(-1) - net: -1
> #2：4(0), 3 (+1), 1(+0.5)  - net: +3.5
> Dawid -1/0 depending on keyword
> #3：2(+1), 3(-1), 3(0)   - net: -1
>
> What do you think? Do you think we can conclude with this result? Or would
> you like to take it as a formal FLIP vote with 3 days voting period?
>
> BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER BUILTIN
> FUNCTION xxx TEMPORARILY" because
> 1. the syntax is more consistent with "CREATE FUNCTION" and "CREATE
> TEMPORARY FUNCTION"
> 2. "ALTER BUILTIN FUNCTION xxx TEMPORARILY" implies it alters a built-in
> function but it actually doesn't, the logic only creates a temp function
> with higher priority than that built-in function in ambiguous resolution
> order; and it would behave inconsistently with "ALTER FUNCTION".
>
>
>
> On Thu, Sep 19, 2019 at 2:58 AM Fabian Hueske  wrote:
>
>> I agree, it's very similar from the implementation point of view and the
>> implications.
>>
>> IMO, the difference is mostly on the mental model for the user.
>> Instead of having a special class of temporary functions that have
>> precedence over builtin functions it suggests to temporarily change
>> built-in functions.
>>
>> Fabian
>>
>> Am Do., 19. Sept. 2019 um 11:52 Uhr schrieb Kurt Young > >:
>>
>> > Hi Fabian,
>> >
>> > I think it's almost the same with #2 with different keyword:
>> >
>> > CREATE TEMPORARY BUILTIN FUNCTION xxx
>> >
>> > Best,
>> > Kurt
>> >
>> >
>> > On Thu, Sep 19, 2019 at 5:50 PM Fabian Hueske 
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > I thought about it a bit more and think that there is some good value
>> in
>> > my
>> > > last proposal.
>> > >
>> > > A lot of complexity comes from the fact that we want to allow
>> overriding
>> > > built-in functions which are differently addressed as other functions
>> > (and
>> > > db objects).
>> > > We could just have "CREATE TEMPORARY FUNCTION" do exactly the same
>> thing
>> > as
>> > > "CREATE FUNCTION" and treat both functions exactly the same except
>> that:
>> > > 1) temp functions disappear at the end of the session
>> > > 2) temp function are resolved before other functions
>> > >
>> > > This would be Dawid's proposal from the beginning of this thread (in
>> case
>> > > you still remember... ;-) )
>> > >
>> > > Temporarily overriding built-in functions would be supported with an
>> > > explicit command like
>> > >
>> > > ALTER BUILTIN FUNCTION xxx TEMPORARILY AS ...
>> > >
>> > > This would also address the concerns about accidentally changing the
>> > > semantics of built-in functions.
>> > > IMO, it can't get much more explicit than the above command.
>> > >
>> > > Sorry for bringing up a new option in the middle of the discussion,
>> but
>> > as
>> > > I said, I think

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

2019-09-19 Thread Bowen Li

Thanks everyone for your feedback. I've converted it to a FLIP wiki [1].

Please take another look. If there's no more concerns, I'd like to start a
voting thread for it.

Thanks

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Modular+Plugins




On Tue, Sep 17, 2019 at 11:25 AM Bowen Li  wrote:

> Hi devs,
>
> We'd like to kick off a conversation on "FLIP-68:  Extend Core Table
> System with Modular Plugins" [1].
>
> The modular approach was raised in discussion of how to support Hive
> built-in functions in FLIP-57 [2]. As we discussed and looked deeper, we
> think it’s a good opportunity to broaden the design and the corresponding
> problem it aims to solve. The motivation is to expand Flink’s core table
> system and enable users to do customizations by writing pluggable modules.
>
> There are two aspects of the motivation:
> 1. Enpower users to write code and do customized developement for Flink
> table core
> 2. Enable users to integrate Flink with cores and built-in objects of
> other systems, so users can reuse what they are familiar with in other SQL
> systems seamlessly as core and built-ins of Flink table
>
> Please take a look, and feedbacks are welcome.
>
> Bowen
>
> [1]
> https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit?usp=sharing
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-20 Thread Bowen Li

"SYSTEM" sounds good to me. FYI, this FLIP only impacts low level of the
SQL function stack and won't actually involve any DDL, thus I will just
document the decision and we should keep it in mind when it's time to
implement the DDLs.

I'm in the process of updating the FLIP to reflect changes required for
option #2, will send a new version for review soon.



On Fri, Sep 20, 2019 at 4:02 PM Dawid Wysakowicz 
wrote:

> I also like the 'System' keyword. I think we can assume we reached
> consensus on this topic.
>
> On Sat, 21 Sep 2019, 06:37 Xuefu Z,  wrote:
>
> > +1 for using the keyword "SYSTEM". Thanks to Timo for chiming in!
> >
> > --Xuefu
> >
> > On Fri, Sep 20, 2019 at 3:28 PM Timo Walther  wrote:
> >
> > > Hi everyone,
> > >
> > > sorry, for the late replay. I give also +1 for option #2. Thus, I guess
> > > we have a clear winner.
> > >
> > > I would also like to find a better keyword/syntax for this statement.
> > > Esp. the BUILTIN keyword can confuse people, because it could be
> written
> > > as BUILTIN, BUILDIN, BUILT_IN, or BUILD_IN. And we would need to
> > > introduce a new reserved keyword in the parser which affects also
> > > non-DDL queries. How about:
> > >
> > > CREATE TEMPORARY SYSTEM FUNCTION xxx
> > >
> > > The SYSTEM keyword is already a reserved keyword and in FLIP-66 we are
> > > discussing to prefix some of the function with a SYSTEM_ prefix like
> > > SYSTEM_WATERMARK. Also SQL defines syntax like "FOR SYSTEM_TIME AS OF".
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > > On 20.09.19 05:45, Bowen Li wrote:
> > > > Another reason I prefer "CREATE TEMPORARY BUILTIN FUNCTION" over
> "ALTER
> > > > BUILTIN FUNCTION xxx TEMPORARILY" is - what if users want to drop the
> > > > temporary built-in function in the same session? With the former one,
> > > they
> > > > can run something like "DROP TEMPORARY BUILTIN FUNCTION"; With the
> > latter
> > > > one, I'm not sure how users can "restore" the original builtin
> function
> > > > easily from an "altered" function without introducing further
> > nonstandard
> > > > SQL syntax.
> > > >
> > > > Also please pardon me as I realized using net may not be a good
> idea...
> > > I'm
> > > > trying to fit this vote into cases listed in Flink Bylaw [1].
> > > >
> > > > >From the following result, the majority seems to be #2 too as it has
> > the
> > > > most approval so far and doesn't have strong "-1".
> > > >
> > > > #1：3 (+1), 1 (0), 4(-1)
> > > > #2：4(0), 3 (+1), 1(+0.5)
> > > > * Dawid -1/0 depending on keyword
> > > > #3：2(+1), 3(-1), 3(0)
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> > > >
> > > > On Thu, Sep 19, 2019 at 10:30 AM Bowen Li 
> wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> Thanks everyone for your votes. I summarized the result as
> following:
> > > >>
> > > >> #1：3 (+1), 1 (0), 4(-1) - net: -1
> > > >> #2：4(0), 2 (+1), 1(+0.5)  - net: +2.5
> > > >>  Dawid -1/0 depending on keyword
> > > >> #3：2(+1), 3(-1), 3(0)   - net: -1
> > > >>
> > > >> Given the result, I'd like to change my vote for #2 from 0 to +1, to
> > > make
> > > >> it a stronger case with net +3.5. So the votes so far are:
> > > >>
> > > >> #1：3 (+1), 1 (0), 4(-1) - net: -1
> > > >> #2：4(0), 3 (+1), 1(+0.5)  - net: +3.5
> > > >>  Dawid -1/0 depending on keyword
> > > >> #3：2(+1), 3(-1), 3(0)   - net: -1
> > > >>
> > > >> What do you think? Do you think we can conclude with this result? Or
> > > would
> > > >> you like to take it as a formal FLIP vote with 3 days voting period?
> > > >>
> > > >> BTW, I'd prefer "CREATE TEMPORARY BUILTIN FUNCTION" over "ALTER
> > BUILTIN
> > > >> FUNCTION xxx TEMPORARILY" because
> > > >> 1. the syntax is more consistent with "CREATE FUNCTION&quo

[VOTE] FLIP-68: Extend Core Table System with Modular Plugins

Hi all,

I'd like to start a vote for FLIP-68 [1], since there's no more concern in
the discussion thread [2]

The vote will be open for minimum 3 days till 5:30pm UTC, Sep 26.

Thanks,
Bowen

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Modular+Plugins
[2] https://www.mail-archive.com/dev@flink.apache.org/msg29894.html

[VOTE] FLIP-57: Rework FunctionCatalog

Hi all,

I'd like to start a voting thread for FLIP-57 [1], which we've reached
consensus in [2].

This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26.

Thanks,
Bowen

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Thanks all for your input!

I've updated FLIP-57 accordingly. To summarize the changes:

   - introduced new concept of "Temporary system functions", which has no
   namespace and override built-in functions
   - repositioned "temporary functions" to be those with namespaces and
   override catalog functions
   - updated FunctionCatalog APIs
   - redefined the ambiguous function resolution order to be:


   1. temporary system functions
  2. builtin functions
  3. temporary functions, of the current catalog/db
  4. catalog functions, in the current catalog/db

Since we've reached consensus on several most critical pieces of the FLIP,
I've started a separate voting thread on it.

Cheers,
Bowen

Re: [DISCUSS] FLIP-63: Rework table partition support

Hi Jingsong,

Thanks for driving this effort!

Besides a few further comments on Catalog APIs that I just left, it LGTM.

Not sure why, but the voting thread in gmail shows in the same thread as
the discussion's. After addressing all the comments, could you start a new,
separate thread to let other people be aware of it?

Thanks,
Bowen

On Mon, Sep 23, 2019 at 1:25 AM JingsongLee 
wrote:

>  Thanks for your discussion on google document.
> Comments addressed and added FileSystem connector chapter, and introduce
> code prototype for file system connector to unify flink file system and
> hive connectors.
>
> Looking forward to your feedbacks. Thank you.
>
> Best,
> Jingsong Lee
>
>
> --
> From:JingsongLee 
> Send Time:2019年9月18日(星期三) 09:45
> To:Kurt Young ; dev 
> Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>
> Thanks for your reply and google doc comments. It has been discussed
>  for two weeks now. I will start a vote thread.
>
> Best,
> Jingsong Lee
>
>
> --
> From:Kurt Young 
> Send Time:2019年9月16日(星期一) 15:55
> To:dev 
> Cc:JingsongLee 
> Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>
> +1 to this feature, I left some comments on google doc.
>
> Another comment is I think we should do some reorganize about the content
> when you converting this to a cwiki page. I will have some offline
> discussion
> with you.
>
> Since this feature seems to be a fairly big efforts, so I suggest we can
> settle
> down the design doc ASAP and start vote process.
> Best,
> Kurt
>
>
> On Thu, Sep 12, 2019 at 12:43 PM Biao Liu  wrote:
> Hi Jingsong,
>
>  Thanks for explaining. It looks cool!
>
>  Thanks,
>  Biao /'bɪ.aʊ/
>
>
>
>  On Wed, 11 Sep 2019 at 11:37, JingsongLee  .invalid>
>  wrote:
>
>  > Hi biao, thanks for your feedbacks:
>  >
>  > Actually, the runtime source partition of runtime is similar to split,
>  > which concerns data reading, parallelism and fault tolerance, all the
>  > runtime concepts.
>  > While table partition is only a virtual concept. Users are more likely
> to
>  > choose which partition to read and which partition to write. Users can
>  > manage their partitions.
>  > One is physical implementation correlation, the other is logical concept
>  > correlation.
>  > So I think they are two completely different things.
>  >
>  > About [2], The main problem is that how to write data to a catalog file
>  > system in stream mode, it is a general problem and has little to do with
>  > partition.
>  >
>  > [2]
>  >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
>  >
>  > Best,
>  > Jingsong Lee
>  >
>  >
>  > --
>  > From:Biao Liu 
>  > Send Time:2019年9月10日(星期二) 14:57
>  > To:dev ; JingsongLee 
>  > Subject:Re: [DISCUSS] FLIP-63: Rework table partition support
>  >
>  > Hi Jingsong,
>  >
>  > Thank you for bringing this discussion. Since I don't have much
> experience
>  > of Flink table/SQL, I'll ask some questions from runtime or engine
>  > perspective.
>  >
>  > > ... where we describe how to partition support in flink and how to
>  > integrate to hive partition.
>  >
>  > FLIP-27 [1] introduces "partition" concept officially. The changes of
>  > FLIP-27 are not only about source interface but also about the whole
>  > infrastructure.
>  > Have you ever thought how to integrate your proposal with these changes?
>  > Or you just want to support "partition" in table layer, there will be no
>  > requirement of underlying infrastructure?
>  >
>  > I have seen a discussion [2] that seems be a requirement of
> infrastructure
>  > to support your proposal. So I have some concerns there might be some
>  > conflicts between this proposal and FLIP-27.
>  >
>  > 1.
>  >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
>  > 2.
>  >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notifyOnMaster-for-notifyCheckpointComplete-td32769.html
>  >
>  > Thanks,
>  > Biao /'bɪ.aʊ/
>  >
>  >
>  >
>  > On Fri, 6 Sep 2019 at 13:22, JingsongLee  .invalid>
>  > wrote:
>  > Hi everyone, thank you for your comments. Mail name was updated
>  >  and streaming-related concepts were added.
>  >
>  >  We would like to start a discussion thread on "FLIP-63: Rework table
>  >  partition support"(Design doc: [1]), where we describe how to partition
>  >  support in flink and how to integrate to hive partition.
>  >
>  >  This FLIP addresses:
>  > - Introduce whole story about partition support.
>  > - Introduce and discuss DDL of partition support.
>  > - Introduce static and dynamic partition insert.
>  > - Introduce partition pruning
>  > - Introduce dynamic partition implementation
>  > - Introduce FileFormatSink to deal with s

Re: [DISCUSS] FLIP 69 - Flink SQL DDL Enhancement

Hi Terry,

Thanks for driving the effort! I left some comments in the doc.

AFAIU, the biggest motivation is to support DDLs in sql parser so that both
Table API and SQL CLI can share the stack, despite that SQL CLI has already
supported some commands itself. However, I don't see details on how SQL CLI
would migrate and depend on sql parser, and how Table API and SQL CLI would
actually share SQL parser. I'm not sure yet how much work that will take,
just want to double check that you didn't include them because they are
very trivial according to your estimate?


On Mon, Sep 16, 2019 at 1:46 AM Terry Wang  wrote:

> Hi everyone,
>
> In flink 1.9, we have introduced some awesome features such as complete
> catalog support[1] and sql ddl support[2]. These features have been a
> critical integration for Flink to be able to manage data and metadata like
> a classic RDBMS and make developers more easy to construct their
> real-time/off-line warehouse or sth similar base on flink.
>
> But there is still a lack of support on how Flink SQL DDL to manage
> metadata and data like classic RDBMS such as `alter table rename` and so on.
>
> So I’d like to kick off a discussion on enhancing Flink Sql Ddls:
>
> https://docs.google.com/document/d/1mhZmx1h2ecfL0x8OzYD1n-nVRn4yE7pwk4jGed4k7kc/edit?usp=sharing
> <
> https://docs.google.com/document/d/1mhZmx1h2ecfL0x8OzYD1n-nVRn4yE7pwk4jGed4k7kc/edit?usp=sharing
> >
>
> In short, it:
> - Add Catalog DDL enhancement support:  show catalogs / describe
> catalog / use catalog
> - Add Database DDL enhancement support:  show databses / create
> database / drop database/ alter database
> - Add Table DDL enhancement support:show tables/ describe
> table / alter table
> - Add Function DDL enhancement support: show functions/ create
> function /drop function
>
> Looking forward to your opinions.
>
> Best,
> Terry Wang
>
>
>
> [1]:https://issues.apache.org/jira/browse/FLINK-11275 <
> https://issues.apache.org/jira/browse/FLINK-11275>
> [2]:https://issues.apache.org/jira/browse/FLINK-1 <
> https://issues.apache.org/jira/browse/FLINK-11275>0232
>

[COMMITTER] repo locked due to synchronization issues

Hi committers,

Recently I've run a repo issue multiple times in different days. When I
tried to push a commit to master, git reports the following error:

```
remote: This repository has been locked due to synchronization issues:
remote:  - /x1/gitbox/broken/flink.txt exists due to a previous error, and
prevents pushes.
remote: This could either be a benign issue, or the repositories could be
out of sync.
remote: Please contact us...@infra.apache.org to have infrastructure
resolve the issue.
remote:
To https://gitbox.apache.org/repos/asf/flink.git
 ! [remote rejected]   master -> master (pre-receive hook declined)
error: failed to push some refs to '
https://gitbox.apache.org/repos/asf/flink.git'
```

This is quite a new issue that didn't come till two or three weeks ago. I
researched online with no luck. I also reported it to ASF INFRA [1] but
their suggested solution doesn't work.

The issue however usually goes away the next morning in PST, so I assume
someone from a different timezone in Asia or Europe fixes it somehow? Has
anyone run into it before? How did you fix it?

Thanks,
Bowen

[1] https://issues.apache.org/jira/projects/INFRA/issues/INFRA-18992

Re: [VOTE] FLIP-63: Rework table partition support

+1. Thanks, Jingsong!

Bowen

On Tue, Sep 24, 2019 at 4:38 AM Terry Wang  wrote:

> +1, Overall looks good.
>
> Best,
> Terry Wang
>
>
>
> > 在 2019年9月24日，下午5:02，Kurt Young  写道：
> >
> > +1 from my side. Some implementation details could be revisited
> > again during code reviewing.
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Sep 24, 2019 at 3:14 PM Jingsong Li 
> wrote:
> >
> >> Just to clarify:
> >>
> >> FLIP wiki:
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> >>
> >>
> >> Discussion thread:
> >>
> >>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-63-Rework-table-partition-support-td32770.html
> >>
> >>
> >> Google Doc:
> >>
> >>
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Tue, Sep 24, 2019 at 11:43 AM Jingsong Lee 
> >> wrote:
> >>
> >>> Thank you for your reminder.
> >>> Updated.
> >>>
> >>> Best,
> >>> Jingsong Lee
> >>>
> >>> On Tue, Sep 24, 2019 at 11:36 AM Kurt Young  wrote:
> >>>
>  Looks like the wiki is not aligned with latest google doc, could
>  you update it first?
> 
>  Best,
>  Kurt
> 
> 
>  On Tue, Sep 24, 2019 at 10:19 AM Jingsong Lee <
> lzljs3620...@apache.org>
>  wrote:
> 
> > Hi Flink devs, after another round of discussion.
> >
> > I would like to re-start the voting for FLIP-63
> > Rework table partition support.
> >
> > FLIP wiki:
> > <
> >
> 
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
> >>
> > <
> >
> 
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> >>
> >
> >
> 
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> >
> > Discussion thread:
> > <
> >
> 
> >>
> https://lists.apache.org/thread.html/65078bad6e047578d502e1e5d92026f13fd9648725f5b74ed330@%3Cdev.flink.apache.org%3E
> >>
> > <
> >
> 
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-51-Rework-of-the-Expression-Design-td31653.html
> >>
> >
> >
> 
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-63-Rework-table-partition-support-td32770.html
> >
> > Google Doc:
> > <
> >
> 
> >>
> https://docs.google.com/document/d/1yFDyquMo_-VZ59vyhaMshpPtg7p87b9IYdAtMXv5XmM/edit?usp=sharing
> >>
> >
> >
> 
> >>
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
> >
> > Thanks,
> >
> > Best,
> > Jingsong Lee
> >
> 
> >>>
> >>>
> >>> --
> >>> Best, Jingsong Lee
> >>>
> >>
> >>
> >> --
> >> Best, Jingsong Lee
> >>
>
>

Re: [COMMITTER] repo locked due to synchronization issues

Thanks everyone for sharing your practices! The problem seems to be gone as
usual and I am able to push to ASF repo.

Verified by ASF INFRA [1], it's indeed caused by the mixed push problem
that Fabian said. Quote from INFRA, "This issue can sometimes occur when
people commit conflicting branches at the same time on gitbox vs github, so
we recommend that projects stick with one or the other for commits."

Though I'm alright with pushing to Github, can we have a single, standard
way of pushing commits to ASF repo. Right now we don't seem to have such a
standard way according to wiki [2]. The standardization helps to not only
avoid the issue mentioned above, but also eradicate problems where, IIRC,
some committers used to forgot reformat commit messages or squash PR's
commits when merging PRs from Github UI.

That's saying, I wonder if we can get consensus on pushing commits only to
ASF gitbox repo and disable committers' write access to the Github mirror?

[1] https://issues.apache.org/jira/browse/INFRA-18992
[2] https://cwiki.apache.org/confluence/display/FLINK/Merging+Pull+Requests

On Tue, Sep 24, 2019 at 4:44 AM Hequn Cheng  wrote:

> I met the same problem. Pushing to the GitHub repo directly works fine and
> it seems will resync the two repos.
>
> Best, Hequn
>
> On Tue, Sep 24, 2019 at 4:59 PM Fabian Hueske  wrote:
>
> > Maybe it's a mix of pushing to the ASF repository and Github mirrors?
> > I'm only pushing to the ASF repositories (although not that frequently
> > anymore...).
> >
> > Cheers, Fabian
> >
> > Am Di., 24. Sept. 2019 um 10:50 Uhr schrieb Till Rohrmann <
> > trohrm...@apache.org>:
> >
> > > Pushing directly to Github also works for me without a problem.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Sep 24, 2019 at 10:28 AM Jark Wu  wrote:
> > >
> > > > Hi Bowen,
> > > >
> > > > I have also encountered this problem. I don't know how to fix this.
> > > > But pushing to GitHub repo always works for me.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Tue, 24 Sep 2019 at 06:05, Bowen Li  wrote:
> > > >
> > > > > Hi committers,
> > > > >
> > > > > Recently I've run a repo issue multiple times in different days.
> > When I
> > > > > tried to push a commit to master, git reports the following error:
> > > > >
> > > > > ```
> > > > > remote: This repository has been locked due to synchronization
> > issues:
> > > > > remote:  - /x1/gitbox/broken/flink.txt exists due to a previous
> > error,
> > > > and
> > > > > prevents pushes.
> > > > > remote: This could either be a benign issue, or the repositories
> > could
> > > be
> > > > > out of sync.
> > > > > remote: Please contact us...@infra.apache.org to have
> infrastructure
> > > > > resolve the issue.
> > > > > remote:
> > > > > To https://gitbox.apache.org/repos/asf/flink.git
> > > > >  ! [remote rejected]   master -> master (pre-receive hook
> > declined)
> > > > > error: failed to push some refs to '
> > > > > https://gitbox.apache.org/repos/asf/flink.git'
> > > > > ```
> > > > >
> > > > > This is quite a new issue that didn't come till two or three weeks
> > > ago. I
> > > > > researched online with no luck. I also reported it to ASF INFRA [1]
> > but
> > > > > their suggested solution doesn't work.
> > > > >
> > > > > The issue however usually goes away the next morning in PST, so I
> > > assume
> > > > > someone from a different timezone in Asia or Europe fixes it
> somehow?
> > > Has
> > > > > anyone run into it before? How did you fix it?
> > > > >
> > > > > Thanks,
> > > > > Bowen
> > > > >
> > > > > [1]
> https://issues.apache.org/jira/projects/INFRA/issues/INFRA-18992
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP 69 - Flink SQL DDL Enhancement

BTW, will there be a "CREATE/DROP CATALOG" DDL?

Though it's not SQL standard, I can see it'll be useful and handy for our
end users in many cases.

On Mon, Sep 23, 2019 at 12:28 PM Bowen Li  wrote:

> Hi Terry,
>
> Thanks for driving the effort! I left some comments in the doc.
>
> AFAIU, the biggest motivation is to support DDLs in sql parser so that
> both Table API and SQL CLI can share the stack, despite that SQL CLI has
> already supported some commands itself. However, I don't see details on how
> SQL CLI would migrate and depend on sql parser, and how Table API and SQL
> CLI would actually share SQL parser. I'm not sure yet how much work that
> will take, just want to double check that you didn't include them because
> they are very trivial according to your estimate?
>
>
> On Mon, Sep 16, 2019 at 1:46 AM Terry Wang  wrote:
>
>> Hi everyone,
>>
>> In flink 1.9, we have introduced some awesome features such as complete
>> catalog support[1] and sql ddl support[2]. These features have been a
>> critical integration for Flink to be able to manage data and metadata like
>> a classic RDBMS and make developers more easy to construct their
>> real-time/off-line warehouse or sth similar base on flink.
>>
>> But there is still a lack of support on how Flink SQL DDL to manage
>> metadata and data like classic RDBMS such as `alter table rename` and so on.
>>
>> So I’d like to kick off a discussion on enhancing Flink Sql Ddls:
>>
>> https://docs.google.com/document/d/1mhZmx1h2ecfL0x8OzYD1n-nVRn4yE7pwk4jGed4k7kc/edit?usp=sharing
>> <
>> https://docs.google.com/document/d/1mhZmx1h2ecfL0x8OzYD1n-nVRn4yE7pwk4jGed4k7kc/edit?usp=sharing
>> >
>>
>> In short, it:
>> - Add Catalog DDL enhancement support:  show catalogs / describe
>> catalog / use catalog
>> - Add Database DDL enhancement support:  show databses / create
>> database / drop database/ alter database
>> - Add Table DDL enhancement support:show tables/ describe
>> table / alter table
>> - Add Function DDL enhancement support: show functions/ create
>> function /drop function
>>
>> Looking forward to your opinions.
>>
>> Best,
>> Terry Wang
>>
>>
>>
>> [1]:https://issues.apache.org/jira/browse/FLINK-11275 <
>> https://issues.apache.org/jira/browse/FLINK-11275>
>> [2]:https://issues.apache.org/jira/browse/FLINK-1 <
>> https://issues.apache.org/jira/browse/FLINK-11275>0232
>>  <https://issues.apache.org/jira/browse/FLINK-11275>
>
>

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Hi Dawid,

Re 1): I agree making it easy for users to run experiments is important.
However, I'm not sure allowing users to register temp functions in
nonexistent catalog/db is the optimal way. It seems a bit hacky, and breaks
the current contract between Flink and users that catalog/db must be valid
in order to operate on.

How about we instead focus on making it convenient to create catalogs?
Users actually can already do it with ease via program or SQL CLI yaml file
for an in-memory catalog which has neither extra dependency nor external
connections. What we can further improve is DDL for catalogs, and I raised
it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by Terry
now.

In that case, if users would like to experiment via SQL, they can easily
create an in memory catalog/database using DDL, then play with temp
functions.

Re 2): For the assumption, IIUIC, function ObjectIdentifier has not been
resolved when stack call reaches FunctionCatalog#lookupFunction(), but only
been parsed?

I agree keeping ObjectIdentifier as-is would be good. I'm ok with the
suggested classes, though making ObjectIdentifier a subclass of
FunctionIdentifier seem a bit counter intuitive.

Another potentially simpler way is:

```
// in class FunctionLookup
class Result {
Optional  getObjectIdentifier() { ... }
String getName() { ... }
...
}
```

WDYT?

On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz 
wrote:

> Hi,
> I really like the flip and think it clarifies important aspects of the
> system.
>
> I have two, I hope small suggestions, which will not take much time to
> agree on.
>
> 1. Could we follow the MySQL approach in regards to the existence of cat/db
> for temporary functions? That means not to check it, so e.g. it's possible
> to create a temporary function in a database that does not exist. I think
> it's really useful e.g in cases when user wants to perform experiments but
> does not have access to the db yet or temporarily does not have connection
> to a catalog.
> 2. Could we not change the ObjectIdentifier? Could we not loosen the
> requirements for all catalog objects such as tables, views, types just for
> the functions? It's really important later on from e.g the serializability
> perspective. The important aspect of the ObjectIdentifier is that we know
> that it has been resolved. The suggested changes break that assumption.
>
> What do you think about adding an interface FunctionIdentifier {
>
> String getName();
>
> /**
>   Return 3-part identifier. Empty in case of a built-in function.
> */
> Optional getObjectIdentifier()
> }
>
> class ObjectIdentifier implements FunctionIdentifier {
> Optional getObjectIdentifier() {
>  return Optional.of(this);
> }
> }
>
> class SystemFunctionIdentifier implements FunctionIdentifier {...}
>
> WDYT?
>
> On Wed, 25 Sep 2019, 04:50 Xuefu Z,  wrote:
>
> > +1. LGTM
> >
> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang  wrote:
> >
> > > +1
> > >
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > > > 在 2019年9月24日，上午10:42，Kurt Young  写道：
> > > >
> > > > +1
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li 
> wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I'd like to start a voting thread for FLIP-57 [1], which we've
> reached
> > > >> consensus in [2].
> > > >>
> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26.
> > > >>
> > > >> Thanks,
> > > >> Bowen
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > > >> [2]
> > > >>
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> > > >>
> > >
> > >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
>

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Sorry, I missed some parts of the solution. The complete alternative is the
following, basically having separate APIs in FunctionLookup for ambiguous
and precise function lookup since planner is able to tell which API to call
with parsed queries, and have a unified result:

```
class FunctionLookup {

Optional lookupAmbiguousFunction(String name);


Optional lookupPreciseFunction(ObjectIdentifier oi);


class Result {
Optional  getObjectIdentifier() { ... }
String getName() { ... }
// ...
}

}
```

Thanks,
Bowen


On Tue, Sep 24, 2019 at 9:42 PM Bowen Li  wrote:

> Hi Dawid,
>
> Re 1): I agree making it easy for users to run experiments is important.
> However, I'm not sure allowing users to register temp functions in
> nonexistent catalog/db is the optimal way. It seems a bit hacky, and breaks
> the current contract between Flink and users that catalog/db must be valid
> in order to operate on.
>
> How about we instead focus on making it convenient to create catalogs?
> Users actually can already do it with ease via program or SQL CLI yaml file
> for an in-memory catalog which has neither extra dependency nor external
> connections. What we can further improve is DDL for catalogs, and I raised
> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by Terry
> now.
>
> In that case, if users would like to experiment via SQL, they can easily
> create an in memory catalog/database using DDL, then play with temp
> functions.
>
> Re 2): For the assumption, IIUIC, function ObjectIdentifier has not been
> resolved when stack call reaches FunctionCatalog#lookupFunction(), but only
> been parsed?
>
> I agree keeping ObjectIdentifier as-is would be good. I'm ok with the
> suggested classes, though making ObjectIdentifier a subclass of
> FunctionIdentifier seem a bit counter intuitive.
>
> Another potentially simpler way is:
>
> ```
> // in class FunctionLookup
> class Result {
> Optional  getObjectIdentifier() { ... }
> String getName() { ... }
> ...
> }
> ```
>
> WDYT?
>
>
>
> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> wysakowicz.da...@gmail.com> wrote:
>
>> Hi,
>> I really like the flip and think it clarifies important aspects of the
>> system.
>>
>> I have two, I hope small suggestions, which will not take much time to
>> agree on.
>>
>> 1. Could we follow the MySQL approach in regards to the existence of
>> cat/db
>> for temporary functions? That means not to check it, so e.g. it's possible
>> to create a temporary function in a database that does not exist. I think
>> it's really useful e.g in cases when user wants to perform experiments but
>> does not have access to the db yet or temporarily does not have connection
>> to a catalog.
>> 2. Could we not change the ObjectIdentifier? Could we not loosen the
>> requirements for all catalog objects such as tables, views, types just for
>> the functions? It's really important later on from e.g the serializability
>> perspective. The important aspect of the ObjectIdentifier is that we know
>> that it has been resolved. The suggested changes break that assumption.
>>
>> What do you think about adding an interface FunctionIdentifier {
>>
>> String getName();
>>
>> /**
>>   Return 3-part identifier. Empty in case of a built-in function.
>> */
>> Optional getObjectIdentifier()
>> }
>>
>> class ObjectIdentifier implements FunctionIdentifier {
>> Optional getObjectIdentifier() {
>>  return Optional.of(this);
>> }
>> }
>>
>> class SystemFunctionIdentifier implements FunctionIdentifier {...}
>>
>> WDYT?
>>
>> On Wed, 25 Sep 2019, 04:50 Xuefu Z,  wrote:
>>
>> > +1. LGTM
>> >
>> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang  wrote:
>> >
>> > > +1
>> > >
>> > > Best,
>> > > Terry Wang
>> > >
>> > >
>> > >
>> > > > 在 2019年9月24日，上午10:42，Kurt Young  写道：
>> > > >
>> > > > +1
>> > > >
>> > > > Best,
>> > > > Kurt
>> > > >
>> > > >
>> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li 
>> wrote:
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> I'd like to start a voting thread for FLIP-57 [1], which we've
>> reached
>> > > >> consensus in [2].
>> > > >>
>> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep
>> 26.
>> > > >>
>> > > >> Thanks,
>> > > >> Bowen
>> > > >>
>> > > >> [1]
>> > > >>
>> > > >>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
>> > > >> [2]
>> > > >>
>> > > >>
>> > >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>> > > >>
>> > >
>> > >
>> >
>> > --
>> > Xuefu Zhang
>> >
>> > "In Honey We Trust!"
>> >
>>
>

Re: [VOTE] FLIP-68: Extend Core Table System with Modular Plugins

2019-09-25 Thread Bowen Li

Hi,

I'd like to withdraw the vote for the moment. From offline feedback I got,
the community is currently running out of bandwidth to review and vote this
FLIP. I'd hold back this effort a little bit,

On Tue, Sep 24, 2019 at 3:26 PM Xuefu Z  wrote:

> +1, LGTM
>
> On Mon, Sep 23, 2019 at 10:26 AM Bowen Li  wrote:
>
> > Hi all,
> >
> > I'd like to start a vote for FLIP-68 [1], since there's no more concern
> in
> > the discussion thread [2]
> >
> > The vote will be open for minimum 3 days till 5:30pm UTC, Sep 26.
> >
> > Thanks,
> > Bowen
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Modular+Plugins
> > [2] https://www.mail-archive.com/dev@flink.apache.org/msg29894.html
> >
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>

Re: [VOTE] FLIP-57: Rework FunctionCatalog

2019-09-25 Thread Bowen Li

Re 1) As described in the FLIP, a temp function lookup will first make sure
the db exists. If the db doesn't exist, a lazy drop is triggered to remove
that temp function.

I agree Hive doesn't handle it consistently, and we are not copying Hive.

IMHO, allowing registering temp functions in nonexistent catalog/db is
hacky and problematic. For instance, "SHOW FUNCTIONS" would list system
functions and functions in the current catalog/db, since users cannot
designate a nonexistent catalog/db as current ones, how can they list
functions in nonexistent catalog/db? They may end up never knowing what
temp functions they've created unless trying out with queries or we
introducing some more nonstandard SQL statements. The same applies to other
temp objects like temp tables.

Re 2) A standalone FunctionIdentifier sounds good to me

On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz 
wrote:

> Ad. 1
> I wouldn't say it is hacky.
> Moreover, how do you want ensure that the dB always exists when a temporary
> object is used?( in this particular case function). Do you want to query
> for the database existence whenever e.g a temporary function is used? I
> think important aspect here is that the database can be dropped from
> external system, not just flink or a different flink session.
>
> E.g in case of hive, you cannot create a temporary table in a database that
> does not exist, that's true. But if you create a temporary table in a
> database and drop that database from a different session, you can still
> query the previously created temporary table from the original session. It
> does not sound like a consistent behaviour to me. Why don't we make this
> behaviour of not binding a temporary objects to the lifetime of a database
> explicit as part of the temporary objects contract? In the end they exist
> in different layers. Permanent objects & databases in a catalog (in case of
> hive megastore) whereas temporary objects in flink sessions. That's also
> true for the original hive client. The temporary objects live in the hive
> client whereas databases are created in the metastore.
>
> Ad.2
> I'm open for suggestions here. The one thing I wanted to achieve here is so
> that we do not change the contract of ObjectIdentifier. One important thing
> to remember here is that we need the function identifier to be part of the
> FunctionDefinition object and not only as the result of the function
> lookup. At some point we want to be able to store QueryOperations in the
> catalogs. They can contain function calls within which we need to have the
> identifier.
>
> I agree my initial suggestion is over complicated. How about we have just
> the FunctionIdentifier as top level class without making the
> ObjectIdentifier extend from it? I think it's pretty much the same what you
> suggested. The only difference is that it would be a top level class with a
> more descriptive name.
>
>
> On Wed, 25 Sep 2019, 13:57 Bowen Li,  wrote:
>
> > Sorry, I missed some parts of the solution. The complete alternative is
> the
> > following, basically having separate APIs in FunctionLookup for ambiguous
> > and precise function lookup since planner is able to tell which API to
> call
> > with parsed queries, and have a unified result:
> >
> > ```
> > class FunctionLookup {
> >
> > Optional lookupAmbiguousFunction(String name);
> >
> >
> > Optional lookupPreciseFunction(ObjectIdentifier oi);
> >
> >
> > class Result {
> > Optional  getObjectIdentifier() { ... }
> > String getName() { ... }
> > // ...
> > }
> >
> > }
> > ```
> >
> > Thanks,
> > Bowen
> >
> >
> > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li  wrote:
> >
> > > Hi Dawid,
> > >
> > > Re 1): I agree making it easy for users to run experiments is
> important.
> > > However, I'm not sure allowing users to register temp functions in
> > > nonexistent catalog/db is the optimal way. It seems a bit hacky, and
> > breaks
> > > the current contract between Flink and users that catalog/db must be
> > valid
> > > in order to operate on.
> > >
> > > How about we instead focus on making it convenient to create catalogs?
> > > Users actually can already do it with ease via program or SQL CLI yaml
> > file
> > > for an in-memory catalog which has neither extra dependency nor
> external
> > > connections. What we can further improve is DDL for catalogs, and I
> > raised
> > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by
> Terry
> > > now.
>

Re: [VOTE] FLIP-57: Rework FunctionCatalog

2019-09-27 Thread Bowen Li

@Dawid, do you have any other concerns? If not, I hope we can close the
voting.


On Thu, Sep 26, 2019 at 8:14 PM Rui Li  wrote:

> I'm not sure how much benefit #1 can bring us. If users just want to try
> out temporary functions, they can create temporary system functions which
> don't require a catalog/DB. IIUC, the main reason why we allow temporary
> catalog function is to let users override permanent catalog functions.
> Therefore a temporary function in a non-existing catalog won't serve that
> purpose. Besides, each session is provided with a default catalog and DB.
> So even if users simply want to create some catalog functions they can
> forget about after the session, wouldn't the default catalog/DB be enough
> for such experiments?
>
> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li  wrote:
>
> > Re 1) As described in the FLIP, a temp function lookup will first make
> sure
> > the db exists. If the db doesn't exist, a lazy drop is triggered to
> remove
> > that temp function.
> >
> > I agree Hive doesn't handle it consistently, and we are not copying Hive.
> >
> > IMHO, allowing registering temp functions in nonexistent catalog/db is
> > hacky and problematic. For instance, "SHOW FUNCTIONS" would list system
> > functions and functions in the current catalog/db, since users cannot
> > designate a nonexistent catalog/db as current ones, how can they list
> > functions in nonexistent catalog/db? They may end up never knowing what
> > temp functions they've created unless trying out with queries or we
> > introducing some more nonstandard SQL statements. The same applies to
> other
> > temp objects like temp tables.
> >
> > Re 2) A standalone FunctionIdentifier sounds good to me
> >
> > On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> > wysakowicz.da...@gmail.com>
> > wrote:
> >
> > > Ad. 1
> > > I wouldn't say it is hacky.
> > > Moreover, how do you want ensure that the dB always exists when a
> > temporary
> > > object is used?( in this particular case function). Do you want to
> query
> > > for the database existence whenever e.g a temporary function is used? I
> > > think important aspect here is that the database can be dropped from
> > > external system, not just flink or a different flink session.
> > >
> > > E.g in case of hive, you cannot create a temporary table in a database
> > that
> > > does not exist, that's true. But if you create a temporary table in a
> > > database and drop that database from a different session, you can still
> > > query the previously created temporary table from the original session.
> > It
> > > does not sound like a consistent behaviour to me. Why don't we make
> this
> > > behaviour of not binding a temporary objects to the lifetime of a
> > database
> > > explicit as part of the temporary objects contract? In the end they
> exist
> > > in different layers. Permanent objects & databases in a catalog (in
> case
> > of
> > > hive megastore) whereas temporary objects in flink sessions. That's
> also
> > > true for the original hive client. The temporary objects live in the
> hive
> > > client whereas databases are created in the metastore.
> > >
> > > Ad.2
> > > I'm open for suggestions here. The one thing I wanted to achieve here
> is
> > so
> > > that we do not change the contract of ObjectIdentifier. One important
> > thing
> > > to remember here is that we need the function identifier to be part of
> > the
> > > FunctionDefinition object and not only as the result of the function
> > > lookup. At some point we want to be able to store QueryOperations in
> the
> > > catalogs. They can contain function calls within which we need to have
> > the
> > > identifier.
> > >
> > > I agree my initial suggestion is over complicated. How about we have
> just
> > > the FunctionIdentifier as top level class without making the
> > > ObjectIdentifier extend from it? I think it's pretty much the same what
> > you
> > > suggested. The only difference is that it would be a top level class
> > with a
> > > more descriptive name.
> > >
> > >
> > > On Wed, 25 Sep 2019, 13:57 Bowen Li,  wrote:
> > >
> > > > Sorry, I missed some parts of the solution. The complete alternative
> is
> > > the
> > > > following, basically having separate APIs in FunctionLookup for
>

Re: [VOTE] FLIP-57: Rework FunctionCatalog

2019-09-30 Thread Bowen Li

Hi,

I think above are some valid points, and we can adopt the suggestions.

To elaborate a bit on the new SQL syntax, it would imply that, unlike "SHOW
FUNCTION" which only return function names, "SHOW ALL [TEMPORARY]
FUNCTIONS" would return functions' fully qualified names with catalog and
db names.



On Mon, Sep 30, 2019 at 6:38 AM Timo Walther  wrote:

> Hi all,
>
> I support Fabian's arguments. In my opinion, temporary objects should
> just be an additional layer on top of the regular catalog/database
> lookup logic. Thus, a temporary table or function has always highest
> precedence and should be stable within the local session. Otherwise it
> could magically disappear while someone else is performing modifications
> in the catalog.
>
> Furthermore, this feature is very useful for prototyping as users can
> simply express that a catalog/database is present even through they
> might not have access to it currently.
>
> Regards,
> Timo
>
>
> On 30.09.19 14:57, Fabian Hueske wrote:
> > Hi all,
> >
> > Sorry for the late reply.
> >
> > I think it might lead to confusing situations if temporary functions (or
> > any temporary db objects for that matter) are bound to the life cycle of
> an
> > (external) db/catalog.
> > Imaging a situation where you create a temp function in a db in an
> external
> > catalog and use it but at some point it does not work anymore because
> some
> > other dropped the database from the external catalog.
> > Shouldn't temporary objects be only controlled by the owner of a session?
> >
> > I agree that creating temp objects in non-existing db/catalogs sounds a
> bit
> > strange, but IMO the opposite (the db/catalog must exist for a temp
> > function to be created/exist) can have significant implications like the
> > one I described.
> > I think it would be quite easy for users to understand that temporary
> > objects are solely owned by them (and their session).
> > The problem of listing temporary objects could be solved by adding a ALL
> > [TEMPORARY] clause:
> >
> > SHOW ALL FUNCTIONS; could show all functions regardless of the
> > catalog/database including temporary functions.
> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
> regardless
> > of the catalog/database.
> >
> > Best,
> > Fabian
> >
> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
> bowenl...@gmail.com>:
> >
> >> @Dawid, do you have any other concerns? If not, I hope we can close the
> >> voting.
> >>
> >>
> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li  wrote:
> >>
> >>> I'm not sure how much benefit #1 can bring us. If users just want to
> try
> >>> out temporary functions, they can create temporary system functions
> which
> >>> don't require a catalog/DB. IIUC, the main reason why we allow
> temporary
> >>> catalog function is to let users override permanent catalog functions.
> >>> Therefore a temporary function in a non-existing catalog won't serve
> that
> >>> purpose. Besides, each session is provided with a default catalog and
> DB.
> >>> So even if users simply want to create some catalog functions they can
> >>> forget about after the session, wouldn't the default catalog/DB be
> enough
> >>> for such experiments?
> >>>
> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li  wrote:
> >>>
> >>>> Re 1) As described in the FLIP, a temp function lookup will first make
> >>> sure
> >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to
> >>> remove
> >>>> that temp function.
> >>>>
> >>>> I agree Hive doesn't handle it consistently, and we are not copying
> >> Hive.
> >>>> IMHO, allowing registering temp functions in nonexistent catalog/db is
> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list
> system
> >>>> functions and functions in the current catalog/db, since users cannot
> >>>> designate a nonexistent catalog/db as current ones, how can they list
> >>>> functions in nonexistent catalog/db? They may end up never knowing
> what
> >>>> temp functions they've created unless trying out with queries or we
> >>>> introducing some more nonstandard SQL statements. The same applies to
> >>> other
> >>>> temp objects like temp tables.
> >>>>
>

Re: [VOTE] FLIP-57: Rework FunctionCatalog

2019-09-30 Thread Bowen Li

Hi all,

I've updated the FLIP wiki with the following changes:

- Lifespan of temp functions are not tied to those of catalogs and
databases. Users can create temp functions even though catalogs/dbs in
their fully qualified names don't even exist.
- some new SQL commands
- "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in
functions, and names of temp and catalog functions in the current catalog
and db
- "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built
functions, and fully qualified names of temp and catalog functions in all
catalogs and dbs
- "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp
functions in all catalog and db
- "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp system
functions

Let me know if you have any questions.

Seems we have resolved all concerns. If there's no more ones, I'd like to
close the vote by this time tomorrow.

Cheers,
Bowen

On Mon, Sep 30, 2019 at 11:59 AM Bowen Li  wrote:

> Hi,
>
> I think above are some valid points, and we can adopt the suggestions.
>
> To elaborate a bit on the new SQL syntax, it would imply that, unlike
> "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY]
> FUNCTIONS" would return functions' fully qualified names with catalog and
> db names.
>
>
>
> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther  wrote:
>
>> Hi all,
>>
>> I support Fabian's arguments. In my opinion, temporary objects should
>> just be an additional layer on top of the regular catalog/database
>> lookup logic. Thus, a temporary table or function has always highest
>> precedence and should be stable within the local session. Otherwise it
>> could magically disappear while someone else is performing modifications
>> in the catalog.
>>
>> Furthermore, this feature is very useful for prototyping as users can
>> simply express that a catalog/database is present even through they
>> might not have access to it currently.
>>
>> Regards,
>> Timo
>>
>>
>> On 30.09.19 14:57, Fabian Hueske wrote:
>> > Hi all,
>> >
>> > Sorry for the late reply.
>> >
>> > I think it might lead to confusing situations if temporary functions (or
>> > any temporary db objects for that matter) are bound to the life cycle
>> of an
>> > (external) db/catalog.
>> > Imaging a situation where you create a temp function in a db in an
>> external
>> > catalog and use it but at some point it does not work anymore because
>> some
>> > other dropped the database from the external catalog.
>> > Shouldn't temporary objects be only controlled by the owner of a
>> session?
>> >
>> > I agree that creating temp objects in non-existing db/catalogs sounds a
>> bit
>> > strange, but IMO the opposite (the db/catalog must exist for a temp
>> > function to be created/exist) can have significant implications like the
>> > one I described.
>> > I think it would be quite easy for users to understand that temporary
>> > objects are solely owned by them (and their session).
>> > The problem of listing temporary objects could be solved by adding a ALL
>> > [TEMPORARY] clause:
>> >
>> > SHOW ALL FUNCTIONS; could show all functions regardless of the
>> > catalog/database including temporary functions.
>> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
>> regardless
>> > of the catalog/database.
>> >
>> > Best,
>> > Fabian
>> >
>> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
>> bowenl...@gmail.com>:
>> >
>> >> @Dawid, do you have any other concerns? If not, I hope we can close the
>> >> voting.
>> >>
>> >>
>> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li  wrote:
>> >>
>> >>> I'm not sure how much benefit #1 can bring us. If users just want to
>> try
>> >>> out temporary functions, they can create temporary system functions
>> which
>> >>> don't require a catalog/DB. IIUC, the main reason why we allow
>> temporary
>> >>> catalog function is to let users override permanent catalog functions.
>> >>> Therefore a temporary function in a non-existing catalog won't serve
>> that
>> >>> purpose. Besides, each session is provided with a default catalog and
>> DB.
>> >>> So even if users simply want to create some catalog functions they can
>> &

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

2019-09-30 Thread Bowen Li

Hi Timo,

Re 1) I agree. I renamed the title to "Extend Core Table System with
Pluggable Modules" and all internal references

Re 2) First, I'll rename the API to useModules(). The design doesn't forbid
users to call useModules() multi times. Objects in modules are loaded on
demand instead of eagerly, so there won't be inconsistency. Users have to
be fully aware of the consequences of resetting modules as that might cause
that some objects can not be referenced anymore or resolution order of some
objects changes.

Re 3) Yes, we'd leave that to users.

Another approach can be to have a non-optional "Core" module for all
objects that cannot be overrode like "CAST" and "AS" functions, and have an
optional "ExtendedCore" module for other replaceable built-in objects.
"Core" should be positioned at the 1st in module list all the time.

I'm fine with either solution.

Re 4) It may sound like a nice-to-have advanced feature for 1.10, but we
can surely fully discuss it for the sake of feature completeness.

Unlike other configs, the order of modules would matter in Flink, and it
implies the LOAD/UNLOAD commands would not be equal in operation positions.
IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the end of
module list, and UNLOAD MODULE 'x' would be interpreted as removing x from
any position in the list?

I'm thinking of the following list of commands:

SHOW MODULES - list modules in order
LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append the
module to end of the module list
UNLOAD MODULE 'hive' - remove the module from module list, and other
modules remain the same relative positions
USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?), or USE
MODULES 'x,y,z' - to reorder module list completely

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Hi Dawid,

Thanks for bringing the suggestions up. I was prototyping yesterday and
found out those places exactly as what you suggested.

For CallExpression and UnresolvedCallExpression, I've added them to
FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and mark
that as a breaking change

For FunctionIdentifier, the suggested changes LGTM. Just want to bring up
an issue on naming. It seems to me how we now name functions categories is
a bit unclear and confusing, which is reflected on the suggested APIs - in
FunctionIdentifier you lay out, "builtin function" would include builtin
functions and temporary system functions as we are kind of using "system"
and "built-in" interchangeably, and "catalog function" would include
catalog functions and temporary functions. I currently can see two
approaches to make it clearer to users.

1) Simplify FunctionIdentifier to be the following. As it's internal, we
add comments and explanation to devs on which case the APIs support.
However, I feel this approach would be somehow a bit conflicting to what
you want to achieve for the clarity of APIs

@Internal
class FunctionIdentifier {
  // for built-in function and temporary system function
public FunctionIdentifier of(String name) {  }
  // for temporary function and catalog function
public FunctionIdentifier of(ObjectIdentifier identifier){  }
public Optional getFunctionName() {  }
public Optional getObjectIdentifier() {  }
}

2) We can rename our function categories as following so there'll be mainly
just two categories of functions, "system functions" and "catalog
functions", either of which can have temporary ones

  - built-in functions -> officially rename to "system functions" and note
to users that "system" and "built-in" can be used interchangeably. We
prefer "system" because that's the keyword we decided to use in DDL that
creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION")
  - temporary system functions
  - catalog functions
  - temporary functions  -> rename to "temporary catalog function"

@Internal
class FunctionIdentifier {
  // for temporary/non-temporary system function
public FunctionIdentifier ofSystemFunction(String name) {  }
  // for temporary/non-temporary catalog function
public FunctionIdentifier ofCatalogFunction(ObjectIdentifier
identifier){  }
public Optional getSystemFunctionName() {  }
public Optional getCatalogFunctionIdentifier() {  }
}

WDYT?


On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske  wrote:

> Thanks for the summary Bowen!
>
> Looks good to me.
>
> Cheers,
> Fabian
>
> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li  >:
>
> > Hi all,
> >
> > I've updated the FLIP wiki with the following changes:
> >
> > - Lifespan of temp functions are not tied to those of catalogs and
> > databases. Users can create temp functions even though catalogs/dbs in
> > their fully qualified names don't even exist.
> > - some new SQL commands
> > - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in
> > functions, and names of temp and catalog functions in the current catalog
> > and db
> > - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built
> > functions, and fully qualified names of temp and catalog functions in all
> > catalogs and dbs
> > - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp
> > functions in all catalog and db
> > - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp
> system
> > functions
> >
> > Let me know if you have any questions.
> >
> > Seems we have resolved all concerns. If there's no more ones, I'd like to
> > close the vote by this time tomorrow.
> >
> > Cheers,
> > Bowen
> >
> > On Mon, Sep 30, 2019 at 11:59 AM Bowen Li  wrote:
> >
> > > Hi,
> > >
> > > I think above are some valid points, and we can adopt the suggestions.
> > >
> > > To elaborate a bit on the new SQL syntax, it would imply that, unlike
> > > "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY]
> > > FUNCTIONS" would return functions' fully qualified names with catalog
> and
> > > db names.
> > >
> > >
> > >
> > > On Mon, Sep 30, 2019 at 6:38 AM Timo Walther 
> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I support Fabian's arguments. In my opinion, temporary objects should
> > >> just be an additional layer on top of the regular catalog/database

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Hi Timo, Dawid,

I've added the suggested SQL and related changes to TableEnvironment API
and other classes to the google doc. Also removed "USE MODULE" and its
APIs. Will update FLIP wiki once we have a consensus.

W.r.t. descriptor approach, my gut feeling is similar to Dawid's. Besides,
I feel yaml file would be a better solution to persist serializable state
of an environment as the file itself is in serializable format already.
Though yaml file only serves SQL CLI at this moment, we may be able to
extend its reach to Table API and allow users to load/offload a
TableEnvironment from/to yaml files, as something like "TableEnvironment
tEnv = TableEnvironment.loadFromYaml()" and
"tEnv.offloadToYaml()" to restore and persist state, and try to
make yaml file more expressive.


On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz 
wrote:

> Hi Timo, Bowen,
>
> Unfortunately I did not have enough time to go through all the
> suggestions in details so I can not comment on the whole FLIP.
>
> I just wanted to give my opinion on the "descriptor approach in
> loadModule" part. I am not sure if we need it here. We might be
> overthinking this a bit. It definitely makes sense for objects like
> TableSource/TableSink etc. as they are logical definitions that nearly
> always have to be persisted in a Catalog. I'm not sure if we really need
> the same for a whole session. If we need a resume session feature, the
> way to go would probably be to keep the session in memory on the server
> side. I fear we will never be able to serialize the whole session
> entirely (temporary objects, objects derived from DataStream etc.)
>
> I think it is ok to use instances for objects like Catalogs or Modules
> and have an overlay on top of that that can create instances from
> properties.
>
> Best,
>
> Dawid
>
> On 01/10/2019 11:28, Timo Walther wrote:
> > Hi Bowen,
> >
> > thanks for your response.
> >
> > Re 2) I also don't have a better approach for this issue. It is
> > similar to changing the general TableConfig between two statements. It
> > would be good to add your explanation to the design document.
> >
> > Re 3) It would be interesting to know about which "core" functions we
> > are actually talking about. Also for the overriding built-in functions
> > that we discussed in the other FLIP. But I'm fine with leaving it to
> > the user for now. How about we just introduce loadModule(),
> > unloadModule() methods instead of useModules()? This would ensure that
> > users don't forget to add the core module when adding an additional
> > module and they need to explicitly call "unloadModule('core')".
> >
> > Re 4) Every table environment feature should also be designed with SQL
> > statements in mind to verify the concept. SQL is also more popular
> > that Java/Scala API or YAML file. I would like to add it to 1.10 for
> > marking the feature as complete.
> >
> > SHOW MODULES -> sounds good to me, we should add a listModules():
> > List method to table environment
> >
> > LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a
> > loadModule() method to table environment
> >
> > UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
> > table environment
> >
> > I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
> > concise API. Users need to load the module anyway with properties.
> > They can also load them "in order" immediately. CREATE TABLE can also
> > not create multiple tables but only one at a time in that order.
> >
> > One thing that came to my mind, shall we use a descriptor approach for
> > loadModule()? The past has shown that passing instances causes
> > problems when persisting objects. That's why we also want to get rid
> > of registerTableSource. I could image that users might want to persist
> > a table environment's state for later use in the future. Even though
> > this is future work, we should already keep such use cases in mind
> > when adding new API methods. What do you think?
> >
> > Regards,
> > Timo
> >
> >
> > On 30.09.19 23:17, Bowen Li wrote:
> >> Hi Timo,
> >>
> >> Re 1) I agree. I renamed the title to "Extend Core Table System with
> >> Pluggable Modules" and all internal references
> >>
> >> Re 2) First, I'll rename the API to useModules(). The design doesn't
> >> forbid
> >> users to call useModules() multi times. Object

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

If something like the yaml file is the way to go and achieve such
motivation, we would cover that with current design.

On Tue, Oct 1, 2019 at 12:05 Bowen Li  wrote:

> Hi Timo, Dawid,
>
> I've added the suggested SQL and related changes to TableEnvironment API
> and other classes to the google doc. Also removed "USE MODULE" and its
> APIs. Will update FLIP wiki once we have a consensus.
>
> W.r.t. descriptor approach, my gut feeling is similar to Dawid's. Besides,
> I feel yaml file would be a better solution to persist serializable state
> of an environment as the file itself is in serializable format already.
> Though yaml file only serves SQL CLI at this moment, we may be able to
> extend its reach to Table API and allow users to load/offload a
> TableEnvironment from/to yaml files, as something like "TableEnvironment
> tEnv = TableEnvironment.loadFromYaml()" and
> "tEnv.offloadToYaml()" to restore and persist state, and try to
> make yaml file more expressive.
>
>
> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz 
> wrote:
>
>> Hi Timo, Bowen,
>>
>> Unfortunately I did not have enough time to go through all the
>> suggestions in details so I can not comment on the whole FLIP.
>>
>> I just wanted to give my opinion on the "descriptor approach in
>> loadModule" part. I am not sure if we need it here. We might be
>> overthinking this a bit. It definitely makes sense for objects like
>> TableSource/TableSink etc. as they are logical definitions that nearly
>> always have to be persisted in a Catalog. I'm not sure if we really need
>> the same for a whole session. If we need a resume session feature, the
>> way to go would probably be to keep the session in memory on the server
>> side. I fear we will never be able to serialize the whole session
>> entirely (temporary objects, objects derived from DataStream etc.)
>>
>> I think it is ok to use instances for objects like Catalogs or Modules
>> and have an overlay on top of that that can create instances from
>> properties.
>>
>> Best,
>>
>> Dawid
>>
>> On 01/10/2019 11:28, Timo Walther wrote:
>> > Hi Bowen,
>> >
>> > thanks for your response.
>> >
>> > Re 2) I also don't have a better approach for this issue. It is
>> > similar to changing the general TableConfig between two statements. It
>> > would be good to add your explanation to the design document.
>> >
>> > Re 3) It would be interesting to know about which "core" functions we
>> > are actually talking about. Also for the overriding built-in functions
>> > that we discussed in the other FLIP. But I'm fine with leaving it to
>> > the user for now. How about we just introduce loadModule(),
>> > unloadModule() methods instead of useModules()? This would ensure that
>> > users don't forget to add the core module when adding an additional
>> > module and they need to explicitly call "unloadModule('core')".
>> >
>> > Re 4) Every table environment feature should also be designed with SQL
>> > statements in mind to verify the concept. SQL is also more popular
>> > that Java/Scala API or YAML file. I would like to add it to 1.10 for
>> > marking the feature as complete.
>> >
>> > SHOW MODULES -> sounds good to me, we should add a listModules():
>> > List method to table environment
>> >
>> > LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a
>> > loadModule() method to table environment
>> >
>> > UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
>> > table environment
>> >
>> > I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
>> > concise API. Users need to load the module anyway with properties.
>> > They can also load them "in order" immediately. CREATE TABLE can also
>> > not create multiple tables but only one at a time in that order.
>> >
>> > One thing that came to my mind, shall we use a descriptor approach for
>> > loadModule()? The past has shown that passing instances causes
>> > problems when persisting objects. That's why we also want to get rid
>> > of registerTableSource. I could image that users might want to persist
>> > a table environment's state for later use in the future. Even though
>> > this is future work, we should already keep such use cases in mind
>> > whe

Re: [ANNOUNCE] Progress of Apache Flink 1.10 #1

Thanks Yu and Gary for the detailed summary and update!

On Fri, Sep 27, 2019 at 6:54 AM Yu Li  wrote:

> Hi community,
>
> Since we are now more than one month into the Flink 1.10 release cycle, we
> thought it would be adequate to give a progress update. Below we have
> included a list of the ongoing efforts that we are aware of, together with
> a brief summary of their state. As always, the list is not meant to be
> exhaustive. If you are working on something that is not included here, feel
> free to use this thread to share your progress.
>
> Note that because we are still relatively at the beginning of the release
> cycle, most of the progress is limited to FLIPs that are accepted or being
> voted on.
>
> - Improving Flink’s build system & CI
> - Repository Split [1]
> - Discussed on the ML but consensus to split the repository was not
> reached.
> - Reduce Build Time [2]
> - Discussion is ongoing. Currently, using Azure Pipelines and
> Gradle are being evaluated.
>
> - Support Java 11 [3]
> - Implementation is in progress (18/21 subtasks resolved)
>
> - Table API improvements
> - FLIP-54 Evolve ConfigOption and Configuration [4]
> - Under discussion.
> - FLIP-59 Enable Execution Configuration from Configuration Object [5]
> - Under discussion.
> - Full Data Type Support in Planner [6]
> - Implementation in progress.
> - FLIP-66 Support Time Attribute in SQL DDL [7]
> - FLIP voting.
> - FLIP-70 Support Computed Column [8]
> - Under discussion.
> - FLIP-63 Rework Table Partition Support [9]
> - FLIP voting
> - FLIP-51 Rework of Expression Design [10]
> - FLIP accepted, implementation in progress.
> - FLIP-55 Introduction of a TableAPI Java Expression DSL [11]
> - Under discussion.
> - FLIP-64 Support for Temporary Objects in Table Module [12]
> - Under discussion.
> - FLIP-65 New Type Inference for Table API UDFs
> - Under discussion.
>
> - Hive compatibility completion (DDL/UDF) to support full Hive integration
> - FLIP-57 Rework FunctionCatalog [13]
> - FLIP voting
> - FLIP-68 Extend Core Table System with Modular Plugins [14]
> - FLIP voting was initiated [15] but temporarily withdrawn due to
> lack of community bandwidth.
>
> - Finer grained resource management
> - FLIP-49: Unified Memory Configuration for TaskExecutors [16]
> - FLIP accepted. Implementation is in progress.
> - FLIP-53: Fine Grained Operator Resource Management [17]
> - FLIP accepted. Implementation details are under discussion.
> - FLIP-56: Dynamic Slot Allocation [18]
> - FLIP accepted. Implementation not started yet.
>
> - Finish scheduler re-architecture [19]
> - Implementation is in progress.
>
> - FLIP-27: Refactor Source Interface [20]
> -  FLIP accepted. Implementation is in progress.
>
> - Executor/Client refactoring [21]
>- Discussion already reached consensus
>- FLIP is coming. A PoC implementation is also ready.
>
> - FLIP-36 Support Interactive Programming [22]
> - Reviewing FLIP-67, which changes the intermediate result management
> in runtime, which is what FLIP-36 will be built on top of.
>
> - FLIP-58: Flink Python User-Defined Stateless Function for Table [23]
> - Implementation is in progress (3/15 subtask resolved).
> - Python environment and dependency management under discussion
>
> - FLIP-50: Spill-able Heap Keyed State Backend [24]
> - FLIP was accepted. Implementation is in progress.
>
> - RocksDB Backend Memory Control [25]
> - Verified capping memory usage through Write Buffer Manager [26] works
> in production.
> - New RocksDB version TBD, 5.18.3/6.2.2 has performance regression [27]
> compared to the currently used version 5.17.2.
> - FLIP of MemoryManager interface for reserving memory to be opened.
>
> - Unaligned Checkpoints [28]
> - Design under discussion.
> - FLIP document is under development and will be released shortly
>
> - Separate framework and user class loader in per-job mode [29]
> - Pull request is being reviewed.
>
> - Active Kubernetes Integration [30]
> - PoC completed. More details need to be discussed before updating the
> PRs.
>
> - FLIP-39 Flink ML pipeline and ML libs [31]
> - ML pipeline API PRs (FLINK-13339) have been opened and are being
> reviewed.
> - Algorithms are waiting for the new ML pipeline API to be merged.
>
> - Add vertex subtask log url on WebUI [32]
> - This makes it easier for users of the WebUI to access the logs of the
> TaskManager that executes a specific subtask.
> - A pull request is opened and currently being reviewed.
>
> As a reminder, the feature freeze is targeted to be at the end of November.
> This leaves us with approximately another 2 months of development time. We
> will send another announcement later in the release cycle to make the date
> of the feature freeze offi

Re: [VOTE] FLIP-57: Rework FunctionCatalog

2019-10-02 Thread Bowen Li

Introducing a new term "path" to APIs like
"getShortPath(Identifier)/getLongPath(Identifier)" would be confusing to
users, thus I feel "getSimpleName/getIdentifier" is fine.

To summarize the discussion result.

   - categorize functions into 2 dimensions - system v.s. catalog, non-temp
   v.s. temp - and that give us 4 combinations
   - definition of FunctionIdentifier

 @PublicEvolving

Class FunctionIdentifier {

String name;

ObjectIdentifier oi;

// for temporary/non-temporary system function
public FunctionIdentifier of(String name) {  }
// for temporary/non-temporary catalog function
public FunctionIdentifier of(ObjectIdentifier identifier) {  }


Optional getIdentifier() {}

Optional getSimpleName() {}

}


I've updated them to FLIP wiki. Please take a final look. I'll close the
voting if there's no other concern raised within 24 hours.

Cheers

On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz 
wrote:

> Hi,
>
> I very much agree with Xuefu's summary of the two points, especially on
> the "functionIdentifier doesn't need to reflect the categories".
>
> For the factory methods I think methods of should be enough:
>
>   // for temporary/non-temporary system function
> public FunctionIdentifier of(String name) {  }
>   // for temporary/non-temporary catalog function
> public FunctionIdentifier of(ObjectIdentifier identifier){  }
>
> In case of the getters I did not like the method name `getName` in the
> original proposal, as in my opinion it could imply that it can return
> also just the name part of an ObjectIdentifier, which should not be the
> case.
>
> I'm fine with getSimpleName/getIdentifier, but want to throw in few
> other suggestions:
>
> * getShortPath(Identifier)/getLongPath(Identifier),
>
> * getSystemPath(Identifier)/getCatalogPath(Identifier)
>
> +1 to any of the 3 options.
>
> One additional thing the FunctionIdentifier should be a PublicEvolving
> class, as it is part of a PublicEvolving APIs e.g. CallExpression, which
> user might need to access e.g. in a filter pushdown.
>
> I also support the Xuefu's suggestion not to support the "ALL" keyword
> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design
> of it  is not part of the FLIP-57, we do not need to agree on that in
> this thread.
>
> Overall I think after updating the FLIP with the outcome of the
> discussion I vote +1 for it.
>
> Best,
>
> Dawid
>
>
> On 02/10/2019 00:21, Xuefu Z wrote:
> > Here are some of my thoughts on the minor debates above:
> >
> > 1. +1 for 4 categories of functions. They are categorized along two
> > dimensions of binary values: X: *temporary* vs non-temporary
> (persistent);
> > Y: *system* vs non-system (so said catalog).
> > 2. In my opinion, class functionIdentifier doesn't really need to reflect
> > the categories of the functions. Instead, we should decouple them to make
> > the API more stable. Thus, my suggestion is:
> >
> > @Internal
> > class FunctionIdentifier {
> >   // for temporary/non-temporary system function
> > public FunctionIdentifier ofSimpleName(String name) {  }
> >   // for temporary/non-temporary catalog function
> > public FunctionIdentifier ofIdentifier(ObjectIdentifier
> > identifier){  }
> > public Optional getSimpleName() {  }
> > public Optional getIdentifier() {  }
> > }
> > 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be:
> >
> > SHOW [TEMPORARY] [SYSTEM] FUNCTIONS.
> >
> > When either keyword is missing, "ALL" is implied along that dimension. We
> > should always limit the search to the system function catalog and the
> > current catalog/DB. I don't see a need of listing functions across
> > different catalogs and databases. (It can be added later if that arises.)
> >
> > Thanks,
> > Xuefu
> >
> > On Tue, Oct 1, 2019 at 11:12 AM Bowen Li  wrote:
> >
> >> Hi Dawid,
> >>
> >> Thanks for bringing the suggestions up. I was prototyping yesterday and
> >> found out those places exactly as what you suggested.
> >>
> >> For CallExpression and UnresolvedCallExpression, I've added them to
> >> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and
> mark
> >> that as a breaking change
> >>
> >> For FunctionIdentifier, the suggested changes LGTM. Just want to bring
> up
> >> an issue on naming. It seems to me how we now name functions categories
> is
> >> a bit unclear

Re: [VOTE] FLIP-57: Rework FunctionCatalog

2019-10-03 Thread Bowen Li

I'm glad to announce that the community has accepted the design of FLIP-57,
and we are moving forward to implementing it.

Thanks everyone!

On Wed, Oct 2, 2019 at 11:01 AM Bowen Li  wrote:

> Introducing a new term "path" to APIs like
> "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing to
> users, thus I feel "getSimpleName/getIdentifier" is fine.
>
> To summarize the discussion result.
>
>- categorize functions into 2 dimensions - system v.s. catalog,
>non-temp v.s. temp - and that give us 4 combinations
>- definition of FunctionIdentifier
>
>  @PublicEvolving
>
> Class FunctionIdentifier {
>
> String name;
>
> ObjectIdentifier oi;
>
> // for temporary/non-temporary system function
> public FunctionIdentifier of(String name) {  }
> // for temporary/non-temporary catalog function
> public FunctionIdentifier of(ObjectIdentifier identifier) {  }
>
>
> Optional getIdentifier() {}
>
> Optional getSimpleName() {}
>
> }
>
>
> I've updated them to FLIP wiki. Please take a final look. I'll close the
> voting if there's no other concern raised within 24 hours.
>
> Cheers
>
> On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz 
> wrote:
>
>> Hi,
>>
>> I very much agree with Xuefu's summary of the two points, especially on
>> the "functionIdentifier doesn't need to reflect the categories".
>>
>> For the factory methods I think methods of should be enough:
>>
>>   // for temporary/non-temporary system function
>> public FunctionIdentifier of(String name) {  }
>>   // for temporary/non-temporary catalog function
>> public FunctionIdentifier of(ObjectIdentifier identifier){  }
>>
>> In case of the getters I did not like the method name `getName` in the
>> original proposal, as in my opinion it could imply that it can return
>> also just the name part of an ObjectIdentifier, which should not be the
>> case.
>>
>> I'm fine with getSimpleName/getIdentifier, but want to throw in few
>> other suggestions:
>>
>> * getShortPath(Identifier)/getLongPath(Identifier),
>>
>> * getSystemPath(Identifier)/getCatalogPath(Identifier)
>>
>> +1 to any of the 3 options.
>>
>> One additional thing the FunctionIdentifier should be a PublicEvolving
>> class, as it is part of a PublicEvolving APIs e.g. CallExpression, which
>> user might need to access e.g. in a filter pushdown.
>>
>> I also support the Xuefu's suggestion not to support the "ALL" keyword
>> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design
>> of it  is not part of the FLIP-57, we do not need to agree on that in
>> this thread.
>>
>> Overall I think after updating the FLIP with the outcome of the
>> discussion I vote +1 for it.
>>
>> Best,
>>
>> Dawid
>>
>>
>> On 02/10/2019 00:21, Xuefu Z wrote:
>> > Here are some of my thoughts on the minor debates above:
>> >
>> > 1. +1 for 4 categories of functions. They are categorized along two
>> > dimensions of binary values: X: *temporary* vs non-temporary
>> (persistent);
>> > Y: *system* vs non-system (so said catalog).
>> > 2. In my opinion, class functionIdentifier doesn't really need to
>> reflect
>> > the categories of the functions. Instead, we should decouple them to
>> make
>> > the API more stable. Thus, my suggestion is:
>> >
>> > @Internal
>> > class FunctionIdentifier {
>> >   // for temporary/non-temporary system function
>> > public FunctionIdentifier ofSimpleName(String name) {  }
>> >   // for temporary/non-temporary catalog function
>> > public FunctionIdentifier ofIdentifier(ObjectIdentifier
>> > identifier){  }
>> > public Optional getSimpleName() {  }
>> > public Optional getIdentifier() {  }
>> > }
>> > 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be:
>> >
>> > SHOW [TEMPORARY] [SYSTEM] FUNCTIONS.
>> >
>> > When either keyword is missing, "ALL" is implied along that dimension.
>> We
>> > should always limit the search to the system function catalog and the
>> > current catalog/DB. I don't see a need of listing functions across
>> > different catalogs and databases. (It can be added later if that
>> arises.)
>> >
>> > Thanks,
>> > Xuefu
>> >
>> &

Re: [VOTE] FLIP-57: Rework FunctionCatalog

2019-10-06 Thread Bowen Li

Hi Aljoscha, Timo

Thanks for the reminder. I've update the details in FLIP wiki, and will
kick off a voting thread.

On Fri, Oct 4, 2019 at 1:51 PM Timo Walther  wrote:

> Hi,
>
> I agree with Aljoscha. It is not transparent to me which votes are
> binding to the current status of the FLIP.
>
> Some other minor comments from my side:
>
> - We don't need to deprecate methods in FunctionCatalog. This class is
> internal. We can simply change the method signatures.
> - `String name` is missing in the FunctionIdentifier code example; can
> we call FunctionIdentifier.getSimpleName() just
> FunctionIdentifier.getName()?
> - Add the methods that we discussed to the example:  `of(String)`,
> `of(ObjectIdentifier)`
>
> Other than that, I'm happy to give my +1 to this proposal.
>
> Thanks for the productive discussion,
> Timo
>
>
> On 04.10.19 13:29, Aljoscha Krettek wrote:
> > Hi,
> >
> > I see there was quite some discussion and changes on the FLIP after this
> VOTE was started. I would suggest to start a new voting thread on the
> current state of the FLIP (keeping in mind that a FLIP vote needs at least
> three committer/PMC votes).
> >
> > For the future, we should probably keep discussion to the [DISCUSS]
> thread and use the vote thread only for voting.
> >
> > Best,
> > Aljoscha
> >
> >> On 3. Oct 2019, at 21:17, Bowen Li  wrote:
> >>
> >> I'm glad to announce that the community has accepted the design of
> FLIP-57,
> >> and we are moving forward to implementing it.
> >>
> >> Thanks everyone!
> >>
> >> On Wed, Oct 2, 2019 at 11:01 AM Bowen Li  wrote:
> >>
> >>> Introducing a new term "path" to APIs like
> >>> "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing
> to
> >>> users, thus I feel "getSimpleName/getIdentifier" is fine.
> >>>
> >>> To summarize the discussion result.
> >>>
> >>>- categorize functions into 2 dimensions - system v.s. catalog,
> >>>non-temp v.s. temp - and that give us 4 combinations
> >>>- definition of FunctionIdentifier
> >>>
> >>>  @PublicEvolving
> >>>
> >>> Class FunctionIdentifier {
> >>>
> >>> String name;
> >>>
> >>> ObjectIdentifier oi;
> >>>
> >>> // for temporary/non-temporary system function
> >>> public FunctionIdentifier of(String name) {  }
> >>> // for temporary/non-temporary catalog function
> >>> public FunctionIdentifier of(ObjectIdentifier identifier) {  }
> >>>
> >>>
> >>> Optional getIdentifier() {}
> >>>
> >>> Optional getSimpleName() {}
> >>>
> >>> }
> >>>
> >>>
> >>> I've updated them to FLIP wiki. Please take a final look. I'll close
> the
> >>> voting if there's no other concern raised within 24 hours.
> >>>
> >>> Cheers
> >>>
> >>> On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <
> dwysakow...@apache.org>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I very much agree with Xuefu's summary of the two points, especially
> on
> >>>> the "functionIdentifier doesn't need to reflect the categories".
> >>>>
> >>>> For the factory methods I think methods of should be enough:
> >>>>
> >>>>   // for temporary/non-temporary system function
> >>>> public FunctionIdentifier of(String name) {  }
> >>>>   // for temporary/non-temporary catalog function
> >>>> public FunctionIdentifier of(ObjectIdentifier identifier){  }
> >>>>
> >>>> In case of the getters I did not like the method name `getName` in the
> >>>> original proposal, as in my opinion it could imply that it can return
> >>>> also just the name part of an ObjectIdentifier, which should not be
> the
> >>>> case.
> >>>>
> >>>> I'm fine with getSimpleName/getIdentifier, but want to throw in few
> >>>> other suggestions:
> >>>>
> >>>> * getShortPath(Identifier)/getLongPath(Identifier),
> >>>>
> >>>> * getSystemPath(Identifier)/getCatalogPath(Identifier)
> >>>>
> >>>> +1 to

[VOTE] FLIP-57: Rework FunctionCatalog, latest updated

2019-10-06 Thread Bowen Li

Hi all,

I'd like to start a new voting thread for FLIP-57 [1] on its latest status
despite [2], and we've reached consensus in [2] and [3].

This voting will be open for minimum 3 days till 6:45am UTC, Oct 10.

Thanks,
Bowen

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
[2] https://www.mail-archive.com/dev@flink.apache.org/msg30180.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

2019-10-09 Thread Bowen Li

Thanks everyone for your review.

After discussing with Timo and Dawid offline, as well as incorporating
feedback from Xuefu and Jark on mailing list, I decided to make a few
critical changes to the proposal.

- renamed the keyword "type" to "kind". The community has plan to have
"type" keyword in yaml/descriptor refer to data types exclusively in the
near future. We should cater to that change in our design
- allowed specifying names for modules to simplify and unify module
loading/unloading syntax between programming and SQL. Here're the proposed
changes:
SQL:
 LOAD MODULE "name" WITH ("kind"="xxx" [, (properties)])
 UNLOAD MODULE "name";
Table:
 tEnv.loadModule("name", new Xxx(properties));
 tEnv.unloadModule("name");

I have completely updated the google doc [1]. Please take another look, and
let me know if you have any other questions. Thanks!

[1]
https://docs.google.com/document/d/17CPMpMbPDjvM4selUVEfh_tqUK_oV0TODAUA9dfHakc/edit#


On Tue, Oct 8, 2019 at 6:26 AM Jark Wu  wrote:

> Hi Bowen,
>
> Thanks for the proposal. I have two thoughts:
>
> 1) Regarding to "loadModule", how about
> tableEnv.loadModule("xxx" [, propertiesMap]);
> tableEnv.unloadModule(“xxx”);
>
> This makes the API similar to SQL. IMO, instance of Module is not needed
> and verbose as parameter.
> And this makes it easier to load a simple module without any additional
> properties, e.g. tEnv.loadModule("GEO"), tEnv.unloadModule("GEO")
>
> 2) In current design, the module interface only defines function metadata,
> but no implementations.
> I'm wondering how to call/map the implementations in runtime? Am I missing
> something?
>
> Besides, I left some minor comments in the doc.
>
> Best,
> Jark
>
>
> On Sat, 5 Oct 2019 at 08:42, Xuefu Z  wrote:
>
> > I agree with Timo that the new table APIs need to be consistent. I'd go
> > further that an name (or id) is needed for module definition in YAML
> file.
> > In the current design, name is skipped and type has binary meanings.
> >
> > Thanks,
> > Xuefu
> >
> > On Fri, Oct 4, 2019 at 5:24 AM Timo Walther  wrote:
> >
> > > Hi everyone,
> > >
> > > first, I was also questioning my proposal. But Bowen's proposal of
> > > `tEnv.offloadToYaml()` would not work with the current
> design
> > > because we don't know how to serialize a catalog or module into
> > > properties. Currently, there is no converter from instance to
> > > properties. It is a one way conversion. We can add a `toProperties`
> > > method to both Catalog and Module class in the future to solve this.
> > > Solving the table environment serializability can be future work.
> > >
> > > However, I find the current proposal for the TableEnvironment methods
> is
> > > contradicting:
> > >
> > > tableEnv.loadModule(new Yyy());
> > > tableEnv.unloadModule(“Xxx”);
> > >
> > > The loading is specified programmatically whereas the unloading
> requires
> > > a string that is not specified in the module itself. But is defined in
> > > the factory according to the current design.
> > >
> > > SQL does it more consistently. There, the name `xxx` is used when
> > > loading and unloading the module:
> > >
> > > LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> > > UNLOAD MODULE 'xxx’
> > >
> > > How about:
> > >
> > > tableEnv.loadModule("xxx", new Yyy());
> > > tableEnv.unloadModule(“xxx”);
> > >
> > > This would be similar to the catalog interfaces. The name is not part
> of
> > > the instance itself.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > >
> > >
> > > On 01.10.19 21:17, Bowen Li wrote:
> > > > If something like the yaml file is the way to go and achieve such
> > > > motivation, we would cover that with current design.
> > > >
> > > > On Tue, Oct 1, 2019 at 12:05 Bowen Li  wrote:
> > > >
> > > >> Hi Timo, Dawid,
> > > >>
> > > >> I've added the suggested SQL and related changes to TableEnvironment
> > API
> > > >> and other classes to the google doc. Also removed "USE MODULE" and
> its
> > > >> APIs. Will update FLIP wiki once we have a consensus.
> > > >>
> > &

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-10-09 Thread Bowen Li

Hi Dawid,

+1 for proposed changes

On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz 
wrote:

> Sorry for a very delayed response.
>
> @Kurt Yes, this is the goal to have a function created like new
> Function(...) also be wrapped into CatalogFunction. This would have to
> be though a temporary function as we cannot represent that as a set of
> properties. Similar to the createTemporaryView(DataStream stream).
>
> As for the ConnectTableDescriptor I agree this is very similar to
> CatalogTable. I am not sure though if we should get rid of it. In the
> end I see it as a builder for a CatalogTable, which is a slightly more
> internal API, but we might revisit that some time in the future if we
> find that it makes more sense.
>
> @All I updated the FLIP page with some more details from the outcome of
> the discussions around FLIP-57. Please take a look. I would like to
> start a vote on this FLIP as soon as the vote on FLIP-57 goes through.
>
> Best,
>
> Dawid
>
>
> On 19/09/2019 09:24, Kurt Young wrote:
> > IIUC it's good to see that both serializable (tables description from
> DDL)
> > and unserializable (tables with DataStream underneath) tables are treated
> > unify with CatalogTable.
> >
> > Can I also assume functions that either come from a function class (from
> > DDL)
> > or function objects (newed by user) will also treated unify with
> > CatalogFunction?
> >
> > This will greatly simplify and unify current API level concepts and
> design.
> >
> > And it seems only one thing left, how do we deal with
> > ConnectTableDescriptor?
> > It's actually very similar with serializable CatalogTable, both carry
> some
> > text
> > properties which even are the same. Is there any chance we can further
> unify
> > this to CatalogTable?
> >
> > object
> > Best,
> > Kurt
> >
> >
> > On Thu, Sep 19, 2019 at 3:13 PM Jark Wu  wrote:
> >
> >> Thanks Dawid for the design doc.
> >>
> >> In general, I’m +1 to the FLIP.
> >>
> >>
> >> +1 to the single-string and parse way to express object path.
> >>
> >> +1 to deprecate registerTableSink & registerTableSource.
> >> But I would suggest to provide an easy way to register a custom
> >> source/sink before we drop them (this is another story).
> >> Currently, it’s not easy to implement a custom connector descriptor.
> >>
> >> Best,
> >> Jark
> >>
> >>
> >>> 在 2019年9月19日，11:37，Dawid Wysakowicz  写道：
> >>>
> >>> Hi JingsongLee,
> >>> From my understanding they can. Underneath they will be CatalogTables.
> >> The
> >>> difference is the lifetime of the tables. Plus some of the user facing
> >>> interfaces cannot be persisted e.g. datastream. Therefore we must have
> a
> >>> separate methods for that. In the end the temporary tables are held in
> >>> memory as CatalogTables.
> >>> Best,
> >>> Dawid
> >>>
> >>> On Thu, 19 Sep 2019, 10:08 JingsongLee,  >> .invalid>
> >>> wrote:
> >>>
>  Hi dawid:
>  Can temporary tables achieve the same capabilities as catalog table?
>  like statistics: CatalogTableStatistics, CatalogColumnStatistics,
>  PartitionStatistics
>  like partition support: we have added some catalog equivalent
> interfaces
>  on TableSource/TableSink: getPartitions, getPartitionFieldNames
>  Maybe it's not a good idea to add these interfaces to
>  TableSource/TableSink. What do you think?
> 
>  Best,
>  Jingsong Lee
> 
> 
>  --
>  From:Kurt Young 
>  Send Time:2019年9月18日(星期三) 17:54
>  To:dev 
>  Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>  module
> 
>  Hi all,
> 
>  Sorry to join this party late. Big +1 to this flip, especially for the
>  dropping
>  "registerTableSink & registerTableSource" part. These are indeed
> legacy
>  and we should try to unify them through CatalogTable after we
> introduce
>  the concept of Catalog.
> 
>  From my understanding, what we can registered should all be metadata,
>  TableSource/TableSink should only be the one who is responsible to do
>  the real work, i.e. reading and writing data according to the schema
> and
>  other information like computed column, partition, .e.g.
> 
>  Best,
>  Kurt
> 
> 
>  On Wed, Sep 18, 2019 at 5:14 PM JingsongLee   .invalid>
>  wrote:
> 
> > After some development and thinking, I have a general understanding.
> > +1 to registering a source/sink does not fit into the SQL world.
> > I am OK to have a deprecated registerTemporarySource/Sink to
> compatible
> > with old ways.
> >
> > Best,
> > Jingsong Lee
> >
> >
> > --
> > From:Timo Walther 
> > Send Time:2019年9月17日(星期二) 08:00
> > To:dev 
> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> > module
> >
> > Hi Dawid,
> >
> > thanks for the design

Re: [VOTE] FLIP-57: Rework FunctionCatalog, latest updated

Hi all,

I hereby announce the FLIP has passed with 6 +1 votes, 4 binding (Dawid,
Timo, Aljoscha, Jark) and 2 non-binding (Xuefu, Jingsong).

Thanks for your review and participation!



On Thu, Oct 10, 2019 at 1:08 AM Jingsong Li  wrote:

> +1
>
> Best,
> Jingsong Lee
>
> On Thu, Oct 10, 2019 at 3:38 PM Jark Wu  wrote:
>
> > +1
> >
> > Thanks,
> > Jark
> >
> > On Wed, 9 Oct 2019 at 01:03, Xuefu Z  wrote:
> >
> > > +1
> > >
> > > On Tue, Oct 8, 2019 at 7:00 AM Aljoscha Krettek 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > > On 8. Oct 2019, at 15:35, Timo Walther  wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > Thanks for driving these efforts,
> > > > > Timo
> > > > >
> > > > > On 07.10.19 10:10, Dawid Wysakowicz wrote:
> > > > >> +1 for the FLIP.
> > > > >>
> > > > >> Best,
> > > > >>
> > > > >> Dawid
> > > > >>
> > > > >> On 07/10/2019 08:45, Bowen Li wrote:
> > > > >>> Hi all,
> > > > >>>
> > > > >>> I'd like to start a new voting thread for FLIP-57 [1] on its
> latest
> > > > status
> > > > >>> despite [2], and we've reached consensus in [2] and [3].
> > > > >>>
> > > > >>> This voting will be open for minimum 3 days till 6:45am UTC, Oct
> > 10.
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Bowen
> > > > >>>
> > > > >>> [1]
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > > > >>> [2]
> > https://www.mail-archive.com/dev@flink.apache.org/msg30180.html
> > > > >>> [3]
> > > > >>>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> > > > >>>
> > > > >
> > > >
> > > >
> > >
> > > --
> > > Xuefu Zhang
> > >
> > > "In Honey We Trust!"
> > >
> >
>
>
> --
> Best, Jingsong Lee
>

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

 general this problem is unsolved
> >>>> for now, also Kafka tables could clash if you read from two Kafka
> >>>> clusters with different versions.
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 10.10.19 08:01, Jark Wu wrote:
> >>>>> Hi Xuefu,
> >>>>>
> >>>>> If there is only one instance per type, then what's the "name" used
> >> for?
> >>>>> Could we remove it and only keep "type" or "kind" to identify
> modules?
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>> On Thu, 10 Oct 2019 at 11:21, Xuefu Z  wrote:
> >>>>>
> >>>>>> Jark has a good point. However, I think validation logic can put in
> >>>> place
> >>>>>> to restrict one instance per type. Maybe the doc needs to be
> specific
> >> on
> >>>>>> this.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Xuefu
> >>>>>>
> >>>>>> On Wed, Oct 9, 2019 at 7:41 PM Jark Wu  wrote:
> >>>>>>
> >>>>>>> Thanks Bowen for the updating.
> >>>>>>>
> >>>>>>> I have some different opinions on the change.
> >>>>>>> IIUC, in the previous design, the "name" is also the "id" or "type"
> >> to
> >>>>>>> identify which module to load. Which means we can only load one
> >>>> instance
> >>>>>> of
> >>>>>>> a module.
> >>>>>>> In the new design, the "name" is just an alias to the module
> >> instance,
> >>>>>> the
> >>>>>>> "kind" is used to identify modules. Which means we can load
> different
> >>>>>>> instances of a module.
> >>>>>>> However, what's the "name" or alias used for? Do we need to support
> >>>>>> loading
> >>>>>>> different instances of a module? From my point of view, it brings
> >> more
> >>>>>>> complexity and confusion.
> >>>>>>> For example, if we load a "hive121" which uses HiveModule with
> >> version
> >>>>>>> 1.2.1 and load a "hive234" which uses HiveModule with version
> 2.3.4,
> >>>> then
> >>>>>>> how to solve the class conflict problem?
> >>>>>>>
> >>>>>>> IMO, a module can only be load once in a session, so "name" maybe
> >>>>>> useless.
> >>>>>>> So my proposal is similar to the previous one, but only change
> "name"
> >>>> to
> >>>>>>> "kind".
> >>>>>>>
> >>>>>>>   SQL:
> >>>>>>> LOAD MODULE "kind" [WITH (properties)];
> >>>>>>> UNLOAD MODULE "kind";
> >>>>>>>Table:
> >>>>>>> tEnv.loadModule("kind" [, properties]);
> >>>>>>> tEnv.unloadModule("kind");
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jark
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, 9 Oct 2019 at 20:38, Bowen Li  wrote:
> >>>>>>>
> >>>>>>>> Thanks everyone for your review.
> >>>>>>>>
> >>>>>>>> After discussing with Timo and Dawid offline, as well as
> >> incorporating
> >>>>>>>> feedback from Xuefu and Jark on mailing list, I decided to make a
> >> few
> >>>>>>>> critical changes to the proposal.
> >>>>>>>>
> >>>>>>>> - renamed the keyword "type" to "kind". The community has plan to
> >> have
> >>>>>>

[VOTE] FLIP-68: Extend Core Table System with Modular Plugins

Hi all,

I'd like to kick off a voting thread for FLIP-68: Extend Core Table System
with Modular Plugins [1], as we have reached consensus in [2].

The voting period will be open for at least 72 hours, ending at 5pm Oct 18,
UTC.

Thanks,
Bowen

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Modular+Plugins
[2] https://www.mail-archive.com/dev@flink.apache.org/msg29894.html

Re: [VOTE] FLIP-64: Support for Temporary Objects in Table module

+1

On Tue, Oct 15, 2019 at 5:09 AM Jark Wu  wrote:

> +1 from my side.
>
> Cheers,
> Jark
>
> On Tue, 15 Oct 2019 at 19:11, vino yang  wrote:
>
> > +1
> >
> > Best,
> > Vino
> >
> > Aljoscha Krettek  于2019年10月15日周二 下午4:31写道：
> >
> > > +1
> > >
> > > Best,
> > > Aljoscha
> > >
> > > > On 14. Oct 2019, at 14:55, Kurt Young  wrote:
> > > >
> > > > +1
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Fri, Oct 11, 2019 at 1:39 PM Dawid Wysakowicz <
> > dwysakow...@apache.org
> > > >
> > > > wrote:
> > > >
> > > >> Hi everyone,
> > > >> I would like to start a vote on FLIP-64. The discussion seems to
> have
> > > >> reached an agreement.
> > > >>
> > > >> Please vote for the following design document:
> > > >>
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> > > >>
> > > >>
> > > >> The discussion can be found at:
> > > >>
> > > >>
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-64-Support-for-Temporary-Objects-in-Table-module-td32684.html
> > > >>
> > > >>
> > > >> This voting will be open for at least 72 hours. I'll try to close it
> > on
> > > >> 2019-10-16 14:00 UTC, unless there is an objection or not enough
> > votes.
> > > >>
> > > >> Best,
> > > >>
> > > >> Dawid
> > > >>
> > > >>
> > > >>
> > >
> > >
> >
>

Re: [Discussion] FLIP-79 Flink Function DDL Support

Hi Zhenqiu,

Thanks for taking on this effort!

A couple questions:
- Though this FLIP is about function DDL, can we also think about how the
created functions can be mapped to CatalogFunction and see if we need to
modify CatalogFunction interface? Syntax changes need to be backed by the
backend.
- Can we define a clearer, smaller scope targeting for Flink 1.10 among all
the proposed changes? The current overall scope seems to be quite wide, and
it may be unrealistic to get everything in a single release, or even a
couple. However, I believe the most common user story can be something as
simple as "being able to create and persist a java class-based udf and use
it later in queries", which will add great value for most Flink users and
is achievable in 1.10.

Bowen

On Sun, Oct 13, 2019 at 10:46 PM Peter Huang 
wrote:

> Dear Community,
>
> FLIP-79 Flink Function DDL Support
> <
> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
> >
>
> This proposal aims to support function DDL with the consideration of SQL
> syntax, language compliance, and advanced external UDF lib registration.
> The Flink DDL is initialized and discussed in the design
> <
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
> >
> [1] by Shuyi Chen and Timo. As the initial discussion mainly focused on the
> table, type and view. FLIP-69 [2] extend it with a more detailed discussion
> of DDL for catalog, database, and function. Original the function DDL was
> under the scope of FLIP-69. After some discussion
>  with the community, we
> found that there are several ongoing efforts, such as FLIP-64 [3], FLIP-65
> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax of
> function DDL, the proposal wants to describe the problem clearly with the
> consideration of existing works and make sure the design aligns with
> efforts of API change of temporary objects and type inference for UDF
> defined by different languages.
>
> The FlLIP outlines the requirements from related works, and propose a SQL
> syntax to meet those requirements. The corresponding implementation is also
> discussed. Please kindly review and give feedback.
>
>
> Best Regards
> Peter Huang
>

Re: [VOTE] FLIP-68: Extend Core Table System with Modular Plugins

sorry, please ignore this thread as the FLIP's name should be "Extend Core
Table System with Pluggable Modules"

On Tue, Oct 15, 2019 at 9:59 AM Bowen Li  wrote:

> Hi all,
>
> I'd like to kick off a voting thread for FLIP-68: Extend Core Table System
> with Modular Plugins [1], as we have reached consensus in [2].
>
> The voting period will be open for at least 72 hours, ending at 5pm Oct
> 18, UTC.
>
> Thanks,
> Bowen
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Modular+Plugins
> [2] https://www.mail-archive.com/dev@flink.apache.org/msg29894.html
>
>

[VOTE] FLIP-68: Extend Core Table System with Pluggable Modules

Hi all,

I'd like to kick off a voting thread for FLIP-68: Extend Core Table System
with Pluggable Modules [1], as we have reached consensus in [2].

The voting period will be open for at least 72 hours, ending at 7pm Oct 18
UTC.

Thanks,
Bowen

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules
[2] https://www.mail-archive.com/dev@flink.apache.org/msg29894.html

Re: [VOTE] Drop Python 2 support for 1.10