date:20190918

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Jark Wu

I agree with Xuefu that inconsistent handling with all the other objects is
not a big problem.

Regarding to option#3, the special "system.system" namespace may confuse
users.
Users need to know the set of built-in function names to know when to use
"system.system" namespace.
What will happen if user registers a non-builtin function name under the
"system.system" namespace?
Besides, I think it doesn't solve the "explode" problem I mentioned at the
beginning of this thread.

So here is my vote:

+1 for #1
0 for #2
-1 for #3

Best,
Jark


On Thu, 19 Sep 2019 at 08:38, Xuefu Z  wrote:

> @Dawid, Re: we also don't need additional referencing the specialcatalog
> anywhere.
>
> True. But once we allow such reference, then user can do so in any possible
> place where a function name is expected, for which we have to handle.
> That's a big difference, I think.
>
> Thanks,
> Xuefu
>
> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> wysakowicz.da...@gmail.com>
> wrote:
>
> > @Bowen I am not suggesting introducing additional catalog. I think we
> need
> > to get rid of the current built-in catalog.
> >
> > @Xuefu in option #3 we also don't need additional referencing the special
> > catalog anywhere else besides in the CREATE statement. The resolution
> > behaviour is exactly the same in both options.
> >
> > On Thu, 19 Sep 2019, 08:17 Xuefu Z,  wrote:
> >
> > > Hi Dawid,
> > >
> > > "GLOBAL" is a temporary keyword that was given to the approach. It can
> be
> > > changed to something else for better.
> > >
> > > The difference between this and the #3 approach is that we only need
> the
> > > keyword for this create DDL. For other places (such as function
> > > referencing), no keyword or special namespace is needed.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > > wysakowicz.da...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > > I think it makes sense to start voting at this point.
> > > >
> > > > Option 1: Only 1-part identifiers
> > > > PROS:
> > > > - allows shadowing built-in functions
> > > > CONS:
> > > > - incosistent with all the other objects, both permanent & temporary
> > > > - does not allow shadowing catalog functions
> > > >
> > > > Option 2: Special keyword for built-in function
> > > > I think this is quite similar to the special catalog/db. The thing I
> am
> > > > strongly against in this proposal is the GLOBAL keyword. This keyword
> > > has a
> > > > meaning in rdbms systems and means a function that is present for a
> > > > lifetime of a session in which it was created, but available in all
> > other
> > > > sessions. Therefore I really don't want to use this keyword in a
> > > different
> > > > context.
> > > >
> > > > Option 3: Special catalog/db
> > > >
> > > > PROS:
> > > > - allows shadowing built-in functions
> > > > - allows shadowing catalog functions
> > > > - consistent with other objects
> > > > CONS:
> > > > - we introduce a special namespace for built-in functions
> > > >
> > > > I don't see a problem with introducing the special namespace. In the
> > end
> > > it
> > > > is very similar to the keyword approach. In this case the catalog/db
> > > > combination would be the "keyword"
> > > >
> > > > Therefore my votes:
> > > > Option 1: -0
> > > > Option 2: -1 (I might change to +0 if we can come up with a better
> > > keyword)
> > > > Option 3: +1
> > > >
> > > > Best,
> > > > Dawid
> > > >
> > > >
> > > > On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:
> > > >
> > > > > Hi Aljoscha,
> > > > >
> > > > > Thanks for the summary and these are great questions to be
> answered.
> > > The
> > > > > answer to your first question is clear: there is a general
> agreement
> > to
> > > > > override built-in functions with temp functions.
> > > > >
> > > > > However, your second and third questions are sort of related, as a
> > > > function
> > > > > reference can be either just function name (like "func") or in the
> > form
> > > > or
> > > > > "cat.db.func". When a reference is just function name, it can mean
> > > > either a
> > > > > built-in function or a function defined in the current cat/db. If
> we
> > > > > support overriding a built-in function with a temp function, such
> > > > > overriding can also cover a function in the current cat/db.
> > > > >
> > > > > I think what Timo referred as "overriding a catalog function"
> means a
> > > > temp
> > > > > function defined as "cat.db.func" overrides a catalog function
> "func"
> > > in
> > > > > cat/db even if cat/db is not current. To support this, temp
> function
> > > has
> > > > to
> > > > > be tied to a cat/db. What's why I said above that the 2nd and 3rd
> > > > questions
> > > > > are related. The problem with such support is the ambiguity when
> user
> > > > > defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func
> > ...".
> > > > > Here "func" can means a global temp function, or a temp function in
> > > > current
> > > > > cat/db. If we can assume the former,

[VOTE] FLIP-66: Support Time Attribute in SQL DDL

2019-09-18 Thread Jark Wu

Hi all,

I would like to start the vote for FLIP-66 [1], which is discussed and
reached a consensus in the discussion thread[2].

The vote will be open for at least 72 hours. I'll try to close it after
Sep. 24 08:00 UTC, unless there is an objection or not enough votes.

Thanks,
Jark

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL

[2]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-66-Support-time-attribute-in-SQL-DDL-tt32766.html

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-09-18 Thread Dawid Wysakowicz

Hi JingsongLee,
>From my understanding they can. Underneath they will be CatalogTables. The
difference is the lifetime of the tables. Plus some of the user facing
interfaces cannot be persisted e.g. datastream. Therefore we must have a
separate methods for that. In the end the temporary tables are held in
memory as CatalogTables.
Best,
Dawid

On Thu, 19 Sep 2019, 10:08 JingsongLee, 
wrote:

> Hi dawid:
> Can temporary tables achieve the same capabilities as catalog table?
> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
> PartitionStatistics
> like partition support: we have added some catalog equivalent interfaces
> on TableSource/TableSink: getPartitions, getPartitionFieldNames
> Maybe it's not a good idea to add these interfaces to
> TableSource/TableSink. What do you think?
>
> Best,
> Jingsong Lee
>
>
> --
> From:Kurt Young 
> Send Time:2019年9月18日(星期三) 17:54
> To:dev 
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
>
> Hi all,
>
> Sorry to join this party late. Big +1 to this flip, especially for the
> dropping
> "registerTableSink & registerTableSource" part. These are indeed legacy
> and we should try to unify them through CatalogTable after we introduce
> the concept of Catalog.
>
> From my understanding, what we can registered should all be metadata,
> TableSource/TableSink should only be the one who is responsible to do
> the real work, i.e. reading and writing data according to the schema and
> other information like computed column, partition, .e.g.
>
> Best,
> Kurt
>
>
> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee  .invalid>
> wrote:
>
> > After some development and thinking, I have a general understanding.
> > +1 to registering a source/sink does not fit into the SQL world.
> > I am OK to have a deprecated registerTemporarySource/Sink to compatible
> > with old ways.
> >
> > Best,
> > Jingsong Lee
> >
> >
> > --
> > From:Timo Walther 
> > Send Time:2019年9月17日(星期二) 08:00
> > To:dev 
> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> > module
> >
> > Hi Dawid,
> >
> > thanks for the design document. It fixes big concept gaps due to
> > historical reasons with proper support for serializability and catalog
> > support in mind.
> >
> > I would not mind a registerTemporarySource/Sink, but the problem that I
> > see is that many people think that this is the recommended way of
> > registering a table source/sink which is not true. We should guide users
> > to either use connect() or DDL API which can be validated and stored in
> > catalog.
> >
> > Also from a concept perspective, registering a source/sink does not fit
> > into the SQL world. SQL does not know about source/sinks but only about
> > tables. If the responsibility of a TableSource/TableSink is just a pure
> > physical data consumer/producer that is not connected to the actual
> > logical table schema, we would need a possibility of defining time
> > attributes and interpreting/converting a changelog. This should be done
> > by the framework with information from the DDL/connect() and not be
> > defined in every table source.
> >
> > Regards,
> > Timo
> >
> >
> > On 09.09.19 14:16, JingsongLee wrote:
> > > Hi dawid:
> > >
> > > It is difficult to describe specific examples.
> > > Sometimes users will generate some java converters through some
> > >   Java code, or generate some Java classes through third-party
> > >   libraries. Of course, these can be best done through properties.
> > > But this requires additional work from users.My suggestion is to
> > >   keep this Java instance class way that is user-friendly.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > >
> > > --
> > > From:Dawid Wysakowicz 
> > > Send Time:2019年9月6日(星期五) 16:21
> > > To:dev 
> > > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> > module
> > >
> > > Hi all,
> > > @Jingsong Could you elaborate a bit more what do you mean by
> > > "some Connectors are difficult to convert all states to properties"
> > > All the Flink provided connectors will definitely be expressible with
> > properties (In the end you should be able to use them from DDL). I think
> if
> > a TableSource is complex enough that it handles filter push down,
> partition
> > support etc. should rather be made available both from DDL & java/scala
> > code. I'm happy to reconsider adding registerTemporaryTable(String path,
> > TableSource source) if you have some concrete examples in mind.
> > >
> > >
> > > @Xuefu: We also considered the ObjectIdentifier (or actually
> introducing
> > a new identifier representation to differentiate between resolved and
> > unresolved identifiers) with the same concerns. We decided to suggest the
> > string & parsing logic because of usability.
> > >  tEnv.from("cat.db.table")
>

Re: [DISCUSS] FLIP-66: Support time attribute in SQL DDL

2019-09-18 Thread Kurt Young

+1 to start vote process.

Best,
Kurt


On Thu, Sep 19, 2019 at 10:54 AM Jark Wu  wrote:

> Hi everyone,
>
> Thanks all for joining the discussion in the doc[1].
> It seems that the discussion is converged and there is a consensus on the
> current FLIP document.
> If there is no objection, I would like to convert it into cwiki FLIP page
> and start voting process.
>
> For more details, please refer to the design doc (it is slightly changed
> since the initial proposal).
>
> Thanks,
> Jark
>
> [1]:
>
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d8258cd
>
> On Mon, 16 Sep 2019 at 16:12, Kurt Young  wrote:
>
> > After some review and discussion in the google document, I think it's
> time
> > to
> > convert this design to a cwiki flip page and start voting process.
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Sep 9, 2019 at 7:46 PM Jark Wu  wrote:
> >
> > > Hi all,
> > >
> > > Thanks all for so much feedbacks received in the doc so far.
> > > I saw a general agreement on using computed column to support proctime
> > > attribute and extract timestamps.
> > > So we will prepare a computed column FLIP and share in the dev ML soon.
> > >
> > > Feel free to leave more comments!
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >
> > > On Fri, 6 Sep 2019 at 13:50, Dian Fu  wrote:
> > >
> > > > Hi Jark,
> > > >
> > > > Thanks for bringing up this discussion and the detailed design doc.
> > This
> > > > is definitely a critical feature for streaming SQL jobs. I have left
> a
> > > few
> > > > comments in the design doc.
> > > >
> > > > Thanks,
> > > > Dian
> > > >
> > > > > 在 2019年9月6日，上午11:48，Forward Xu  写道：
> > > > >
> > > > > Thanks Jark for this topic, This will be very useful.
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > ForwardXu
> > > > >
> > > > > Danny Chan  于2019年9月6日周五 上午11:26写道：
> > > > >
> > > > >> Thanks Jark for bring up this topic, this is definitely an import
> > > > feature
> > > > >> for the SQL, especially the DDL users.
> > > > >>
> > > > >> I would spend some time to review this design doc, really thanks.
> > > > >>
> > > > >> Best,
> > > > >> Danny Chan
> > > > >> 在 2019年9月6日 +0800 AM11:19，Jark Wu ，写道：
> > > > >>> Hi everyone,
> > > > >>>
> > > > >>> I would like to start discussion about how to support time
> > attribute
> > > in
> > > > >> SQL
> > > > >>> DDL.
> > > > >>> In Flink 1.9, we already introduced a basic SQL DDL to create a
> > > table.
> > > > >>> However, it doesn't support to define time attributes. This makes
> > > users
> > > > >>> can't
> > > > >>> apply window operations on the tables created by DDL which is a
> bad
> > > > >>> experience.
> > > > >>>
> > > > >>> In FLIP-66, we propose a syntax for watermark to define rowtime
> > > > attribute
> > > > >>> and propose to use computed column syntax to define proctime
> > > attribute.
> > > > >>> But computed column is another big topic and should deserve a
> > > separate
> > > > >>> FLIP.
> > > > >>> If we have a consensus on the computed column approach, we will
> > start
> > > > >>> computed column FLIP soon.
> > > > >>>
> > > > >>> FLIP-66:
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#
> > > > >>>
> > > > >>> Thanks for any feedback!
> > > > >>>
> > > > >>> Best,
> > > > >>> Jark
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-66: Support time attribute in SQL DDL

2019-09-18 Thread Jark Wu

Hi everyone,

Thanks all for joining the discussion in the doc[1].
It seems that the discussion is converged and there is a consensus on the
current FLIP document.
If there is no objection, I would like to convert it into cwiki FLIP page
and start voting process.

For more details, please refer to the design doc (it is slightly changed
since the initial proposal).

Thanks,
Jark

[1]:
https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d8258cd

On Mon, 16 Sep 2019 at 16:12, Kurt Young  wrote:

> After some review and discussion in the google document, I think it's time
> to
> convert this design to a cwiki flip page and start voting process.
>
> Best,
> Kurt
>
>
> On Mon, Sep 9, 2019 at 7:46 PM Jark Wu  wrote:
>
> > Hi all,
> >
> > Thanks all for so much feedbacks received in the doc so far.
> > I saw a general agreement on using computed column to support proctime
> > attribute and extract timestamps.
> > So we will prepare a computed column FLIP and share in the dev ML soon.
> >
> > Feel free to leave more comments!
> >
> > Best,
> > Jark
> >
> >
> >
> > On Fri, 6 Sep 2019 at 13:50, Dian Fu  wrote:
> >
> > > Hi Jark,
> > >
> > > Thanks for bringing up this discussion and the detailed design doc.
> This
> > > is definitely a critical feature for streaming SQL jobs. I have left a
> > few
> > > comments in the design doc.
> > >
> > > Thanks,
> > > Dian
> > >
> > > > 在 2019年9月6日，上午11:48，Forward Xu  写道：
> > > >
> > > > Thanks Jark for this topic, This will be very useful.
> > > >
> > > >
> > > > Best,
> > > >
> > > > ForwardXu
> > > >
> > > > Danny Chan  于2019年9月6日周五 上午11:26写道：
> > > >
> > > >> Thanks Jark for bring up this topic, this is definitely an import
> > > feature
> > > >> for the SQL, especially the DDL users.
> > > >>
> > > >> I would spend some time to review this design doc, really thanks.
> > > >>
> > > >> Best,
> > > >> Danny Chan
> > > >> 在 2019年9月6日 +0800 AM11:19，Jark Wu ，写道：
> > > >>> Hi everyone,
> > > >>>
> > > >>> I would like to start discussion about how to support time
> attribute
> > in
> > > >> SQL
> > > >>> DDL.
> > > >>> In Flink 1.9, we already introduced a basic SQL DDL to create a
> > table.
> > > >>> However, it doesn't support to define time attributes. This makes
> > users
> > > >>> can't
> > > >>> apply window operations on the tables created by DDL which is a bad
> > > >>> experience.
> > > >>>
> > > >>> In FLIP-66, we propose a syntax for watermark to define rowtime
> > > attribute
> > > >>> and propose to use computed column syntax to define proctime
> > attribute.
> > > >>> But computed column is another big topic and should deserve a
> > separate
> > > >>> FLIP.
> > > >>> If we have a consensus on the computed column approach, we will
> start
> > > >>> computed column FLIP soon.
> > > >>>
> > > >>> FLIP-66:
> > > >>>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#
> > > >>>
> > > >>> Thanks for any feedback!
> > > >>>
> > > >>> Best,
> > > >>> Jark
> > > >>
> > >
> > >
> >
>

Confluence permission for FLIP creation

2019-09-18 Thread Terry Wang

Hi all, 

As communicated in an email thread, I'm proposing Flink SQL ddl enhancement. I 
have a draft design doc that I'd like to convert it to a FLIP. Thus, it would 
be great if anyone who can grant me the write access to Confluence. My 
Confluence ID is zjuwangg.

It would be nice if any of you can help on this.

Best,
Terry Wang

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-09-18 Thread JingsongLee

Hi dawid:
Can temporary tables achieve the same capabilities as catalog table?
like statistics: CatalogTableStatistics, CatalogColumnStatistics, 
PartitionStatistics
like partition support: we have added some catalog equivalent interfaces on 
TableSource/TableSink: getPartitions, getPartitionFieldNames
Maybe it's not a good idea to add these interfaces to TableSource/TableSink. 
What do you think?

Best,
Jingsong Lee


--
From:Kurt Young 
Send Time:2019年9月18日(星期三) 17:54
To:dev 
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi all,

Sorry to join this party late. Big +1 to this flip, especially for the
dropping
"registerTableSink & registerTableSource" part. These are indeed legacy
and we should try to unify them through CatalogTable after we introduce
the concept of Catalog.

From my understanding, what we can registered should all be metadata,
TableSource/TableSink should only be the one who is responsible to do
the real work, i.e. reading and writing data according to the schema and
other information like computed column, partition, .e.g.

Best,
Kurt


On Wed, Sep 18, 2019 at 5:14 PM JingsongLee 
wrote:

> After some development and thinking, I have a general understanding.
> +1 to registering a source/sink does not fit into the SQL world.
> I am OK to have a deprecated registerTemporarySource/Sink to compatible
> with old ways.
>
> Best,
> Jingsong Lee
>
>
> --
> From:Timo Walther 
> Send Time:2019年9月17日(星期二) 08:00
> To:dev 
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
>
> Hi Dawid,
>
> thanks for the design document. It fixes big concept gaps due to
> historical reasons with proper support for serializability and catalog
> support in mind.
>
> I would not mind a registerTemporarySource/Sink, but the problem that I
> see is that many people think that this is the recommended way of
> registering a table source/sink which is not true. We should guide users
> to either use connect() or DDL API which can be validated and stored in
> catalog.
>
> Also from a concept perspective, registering a source/sink does not fit
> into the SQL world. SQL does not know about source/sinks but only about
> tables. If the responsibility of a TableSource/TableSink is just a pure
> physical data consumer/producer that is not connected to the actual
> logical table schema, we would need a possibility of defining time
> attributes and interpreting/converting a changelog. This should be done
> by the framework with information from the DDL/connect() and not be
> defined in every table source.
>
> Regards,
> Timo
>
>
> On 09.09.19 14:16, JingsongLee wrote:
> > Hi dawid:
> >
> > It is difficult to describe specific examples.
> > Sometimes users will generate some java converters through some
> >   Java code, or generate some Java classes through third-party
> >   libraries. Of course, these can be best done through properties.
> > But this requires additional work from users.My suggestion is to
> >   keep this Java instance class way that is user-friendly.
> >
> > Best,
> > Jingsong Lee
> >
> >
> > --
> > From:Dawid Wysakowicz 
> > Send Time:2019年9月6日(星期五) 16:21
> > To:dev 
> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
> >
> > Hi all,
> > @Jingsong Could you elaborate a bit more what do you mean by
> > "some Connectors are difficult to convert all states to properties"
> > All the Flink provided connectors will definitely be expressible with
> properties (In the end you should be able to use them from DDL). I think if
> a TableSource is complex enough that it handles filter push down, partition
> support etc. should rather be made available both from DDL & java/scala
> code. I'm happy to reconsider adding registerTemporaryTable(String path,
> TableSource source) if you have some concrete examples in mind.
> >
> >
> > @Xuefu: We also considered the ObjectIdentifier (or actually introducing
> a new identifier representation to differentiate between resolved and
> unresolved identifiers) with the same concerns. We decided to suggest the
> string & parsing logic because of usability.
> >  tEnv.from("cat.db.table")
> > is shorter and easier to write than
> >  tEnv.from(Identifier.for("cat", "db", "name")
> > And also implicitly solves the problem what happens if a user (e.g. used
> to other systems) uses that API in a following manner:
> >  tEnv.from(Identifier.for("db.name")
> > I'm happy to revisit it if the general consensus is that it's better to
> use the OO aproach.
> > Best,
> > Dawid
> >
> > On 06/09/2019 10:00, Xuefu Z wrote:
> >
> > Thanks to Dawid for starting the discussion and writeup. It looks pretty
> > good to me except that I'm a little concerned about the object reference
> > and string parsing in the code, which

Re: [DISCUSS] Contribute Pulsar Flink connector back to Flink

2019-09-18 Thread Becket Qin

Hi Yijie,

Could you please follow the FLIP process to start a new FLIP [DISCUSSION]
thread in the mailing list?

https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Process

I see two FLIP-69 discussion in the mailing list now. So there is a FLIP
number collision. Can you change the FLIP number to 72?

Thanks,

Jiangjie (Becket) Qin

On Thu, Sep 19, 2019 at 12:23 AM Rong Rong  wrote:

> Hi Yijie,
>
> Thanks for sharing the pulsar FLIP.
> Would you mind enabling comments/suggestions on the google doc link? This
> way the contributors from the community can comment on the doc.
>
> Best,
> Rong
>
> On Mon, Sep 16, 2019 at 5:43 AM Yijie Shen 
> wrote:
>
> > Hello everyone,
> >
> > I've drafted a FLIP that describes the current design of the Pulsar
> > connector:
> >
> >
> >
> https://docs.google.com/document/d/1rES79eKhkJxrRfQp1b3u8LB2aPaq-6JaDHDPJIA8kMY/edit#
> >
> > Please take a look and let me know what you think.
> >
> > Thanks,
> > Yijie
> >
> > On Sat, Sep 14, 2019 at 12:08 AM Rong Rong  wrote:
> > >
> > > Hi All,
> > >
> > > Sorry for joining the discussion late and thanks Yijie & Sijie for
> > driving
> > > the discussion.
> > > I also think the Pulsar connector would be a very valuable addition to
> > > Flink. I can also help out a bit on the review side :-)
> > >
> > > Regarding the timeline, I also share concerns with Becket on the
> > > relationship between the new Pulsar connector and FLIP-27.
> > > There's also another discussion just started by Stephan on dropping
> Kafka
> > > 9/10 support on next Flink release [1].  Although the situation is
> > somewhat
> > > different, and Kafka 9/10 connector has been in Flink for almost 3-4
> > years,
> > > based on the discussion I am not sure if a major version release is a
> > > requirement for removing old connector supports.
> > >
> > > I think there shouldn't be a blocker if we agree the old connector will
> > be
> > > removed once FLIP-27 based Pulsar connector is there. As Stephan
> stated,
> > it
> > > is easier to contribute the source sooner and adjust it later.
> > > We should also ensure we clearly communicate the message: for example,
> > > putting an experimental flag on the pre-FLIP27 connector page of the
> > > website, documentations, etc. Any other thoughts?
> > >
> > > --
> > > Rong
> > >
> > > [1]
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Drop-older-versions-of-Kafka-Connectors-0-9-0-10-for-Flink-1-10-td29916.html
> > >
> > >
> > > On Fri, Sep 13, 2019 at 8:15 AM Becket Qin 
> wrote:
> > >
> > > > Technically speaking, removing the old connector code is a backwards
> > > > incompatible change which requires a major version bump, i.e. Flink
> > 2.x.
> > > > Given that we don't have a clear plan on when to have the next major
> > > > version release, it seems unclear how long the old connector code
> will
> > be
> > > > there if we check it in right now. Or will we remove the old
> connector
> > > > without a major version bump? In any case, it sounds not quite user
> > > > friendly to the those who might use the old Pulsar connector. I am
> not
> > sure
> > > > if it is worth these potential problems in order to have the Pulsar
> > source
> > > > connector checked in one or two months earlier.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Thu, Sep 12, 2019 at 3:52 PM Stephan Ewen 
> wrote:
> > > >
> > > > > Agreed, if we check in the old code, we should make it clear that
> it
> > will
> > > > > be removed as soon as the FLIP-27 based version of the connector is
> > > > there.
> > > > > We should not commit to maintaining the old version, that would be
> > indeed
> > > > > too much overhead.
> > > > >
> > > > > On Thu, Sep 12, 2019 at 3:30 AM Becket Qin 
> > wrote:
> > > > >
> > > > > > Hi Stephan,
> > > > > >
> > > > > > Thanks for the volunteering to help.
> > > > > >
> > > > > > Yes, the overhead would just be review capacity. In fact, I am
> not
> > > > > worrying
> > > > > > too much about the review capacity. That is just a one time cost.
> > My
> > > > > > concern is mainly about the long term burden. Assume we have new
> > source
> > > > > > interface ready in 1.10 with newly added Pulsar connectors in old
> > > > > > interface. Later on if we migrate Pulsar to new source interface,
> > the
> > > > old
> > > > > > Pulsar connector might be deprecated almost immediately after
> > checked
> > > > in,
> > > > > > but we may still have to maintain two code bases. For the
> existing
> > > > > > connectors, we have to do that anyways. But it would be good to
> > avoid
> > > > > > introducing a new connector with the same problem.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Tue, Sep 10, 2019 at 6:51 PM Stephan Ewen 
> > wrote:
> > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > Nice to see this lively discussion about the Pulsar

[jira] [Created] (FLINK-14122) Extend State Processor API to read ListCheckpointed operator state

2019-09-18 Thread Seth Wiesman (Jira)

Seth Wiesman created FLINK-14122:


 Summary: Extend State Processor API to read ListCheckpointed 
operator state
 Key: FLINK-14122
 URL: https://issues.apache.org/jira/browse/FLINK-14122
 Project: Flink
  Issue Type: Sub-task
  Components: API / DataStream
Affects Versions: 1.10.0
Reporter: Seth Wiesman


The state processor api cannot  read operator state using the ListCheckpointed 
interface because it requires access the JavaSerializer which is package 
private. Instead of making that class public, we should offer a 
readListCheckpointed Method to easily read this state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-48: Pluggable Intermediate Result Storage

2019-09-18 Thread Becket Qin

Hi Stephan,

Sorry for the belated reply. You are right that the functionality proposed
in this FLIP can be implemented out of the Flink core as an ecosystem
project.

The main motivation of this FLIP is two folds:

1. Improve the performance of intermediate result sharing in the same
session.
Using the internal shuffle service to store cached result has two potential
performance problems.
  a) the cached intermediate results may break the operator chaining due to
the addition of BLOCKING_PERSISTENT edge.
  b) the downstream processor must read all the records in intermediate
results to process.

A pluggable intermediate result storage will help address both of the
problem. Adding a sink will not break chaining, but just ensure cached
logical node will not be optimized away. The pluggable storage can help
improve the performance by making the intermediate results filterable /
projectable, etc. Alternatively we can make the shuffle service more
sophisticated, but it may complicate things and is not necessary for the
normal shuffles.

This motivation seems difficult to be addressed as an external library on
top of Flink core, mainly because the in-session intermediate result
cleanup may need participation of RM to achieve fault tolerance. Also,
having an external library essentially introduces another way to cache the
in-session intermediate results.

2. Cross session intermediate result sharing.
As you said, this can be implemented as an external library. The only
difference is that users may need to deal with another set of API, but that
seems OK.

So for this FLIP, it would be good to see whether we think motivation 1 is
worth addressing or not.

What do you think?

Thanks,

Jiangjie (Becket) Qin

On Thu, Aug 15, 2019 at 11:42 PM Stephan Ewen  wrote:

> Sorry for the late response. So many FLIPs these days.
>
> I am a bit unsure about the motivation here, and that this need to be a
> part of Flink. It sounds like this can be perfectly built around Flink as a
> minimal library on top of it, without any change in the core APIs or
> runtime.
>
> The proposal to handle "caching intermediate results" (to make them
> reusable across jobs in a session), and "writing them in different formats
> / indexing them" doesn't sound like it should be the same mechanism.
>
>   - The caching part is a transparent low-level primitive. It avoid
> re-executing a part of the job graph, but otherwise is completely
> transparent to the consumer job.
>
>   - Writing data out in a sink, compressing/indexing it and then reading it
> in another job is also a way of reusing a previous result, but on a
> completely different abstraction level. It is not the same intermediate
> result any more. When the consumer reads from it and applies predicate
> pushdown, etc. then the consumer job looks completely different from a job
> that consumed the original result. It hence needs to be solved on the API
> level via a sink and a source.
>
> I would suggest to keep these concepts separate: Caching (possibly
> automatically) for jobs in a session, and long term writing/sharing of data
> sets.
>
> Solving the "long term writing/sharing" in a library rather than in the
> runtime also has the advantage of not pushing yet more stuff into Flink's
> core, which I believe is also an important criterion.
>
> Best,
> Stephan
>
>
> On Thu, Jul 25, 2019 at 4:53 AM Xuannan Su  wrote:
>
> > Hi folks,
> >
> > I would like to start the FLIP discussion thread about the pluggable
> > intermediate result storage.
> >
> > This is phase 2 of FLIP-36: Support Interactive Programming in Flink Skip
> > to end of metadata. While the FLIP-36 provides a default implementation
> of
> > the intermediate result storage using the shuffle service, we would like
> to
> > make the intermediate result storage pluggable so that the user can
> easily
> > swap the storage.
> >
> > We are looking forward to your thought!
> >
> > The FLIP link is the following:
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-48%3A+Pluggable+Intermediate+Result+Storage
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-48:+Pluggable+Intermediate+Result+Storage
> > >
> >
> > Best,
> > Xuannan
> >
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Xuefu Z

@Dawid, Re: we also don't need additional referencing the specialcatalog
anywhere.

True. But once we allow such reference, then user can do so in any possible
place where a function name is expected, for which we have to handle.
That's a big difference, I think.

Thanks,
Xuefu

On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz 
wrote:

> @Bowen I am not suggesting introducing additional catalog. I think we need
> to get rid of the current built-in catalog.
>
> @Xuefu in option #3 we also don't need additional referencing the special
> catalog anywhere else besides in the CREATE statement. The resolution
> behaviour is exactly the same in both options.
>
> On Thu, 19 Sep 2019, 08:17 Xuefu Z,  wrote:
>
> > Hi Dawid,
> >
> > "GLOBAL" is a temporary keyword that was given to the approach. It can be
> > changed to something else for better.
> >
> > The difference between this and the #3 approach is that we only need the
> > keyword for this create DDL. For other places (such as function
> > referencing), no keyword or special namespace is needed.
> >
> > Thanks,
> > Xuefu
> >
> > On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > wysakowicz.da...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > I think it makes sense to start voting at this point.
> > >
> > > Option 1: Only 1-part identifiers
> > > PROS:
> > > - allows shadowing built-in functions
> > > CONS:
> > > - incosistent with all the other objects, both permanent & temporary
> > > - does not allow shadowing catalog functions
> > >
> > > Option 2: Special keyword for built-in function
> > > I think this is quite similar to the special catalog/db. The thing I am
> > > strongly against in this proposal is the GLOBAL keyword. This keyword
> > has a
> > > meaning in rdbms systems and means a function that is present for a
> > > lifetime of a session in which it was created, but available in all
> other
> > > sessions. Therefore I really don't want to use this keyword in a
> > different
> > > context.
> > >
> > > Option 3: Special catalog/db
> > >
> > > PROS:
> > > - allows shadowing built-in functions
> > > - allows shadowing catalog functions
> > > - consistent with other objects
> > > CONS:
> > > - we introduce a special namespace for built-in functions
> > >
> > > I don't see a problem with introducing the special namespace. In the
> end
> > it
> > > is very similar to the keyword approach. In this case the catalog/db
> > > combination would be the "keyword"
> > >
> > > Therefore my votes:
> > > Option 1: -0
> > > Option 2: -1 (I might change to +0 if we can come up with a better
> > keyword)
> > > Option 3: +1
> > >
> > > Best,
> > > Dawid
> > >
> > >
> > > On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:
> > >
> > > > Hi Aljoscha,
> > > >
> > > > Thanks for the summary and these are great questions to be answered.
> > The
> > > > answer to your first question is clear: there is a general agreement
> to
> > > > override built-in functions with temp functions.
> > > >
> > > > However, your second and third questions are sort of related, as a
> > > function
> > > > reference can be either just function name (like "func") or in the
> form
> > > or
> > > > "cat.db.func". When a reference is just function name, it can mean
> > > either a
> > > > built-in function or a function defined in the current cat/db. If we
> > > > support overriding a built-in function with a temp function, such
> > > > overriding can also cover a function in the current cat/db.
> > > >
> > > > I think what Timo referred as "overriding a catalog function" means a
> > > temp
> > > > function defined as "cat.db.func" overrides a catalog function "func"
> > in
> > > > cat/db even if cat/db is not current. To support this, temp function
> > has
> > > to
> > > > be tied to a cat/db. What's why I said above that the 2nd and 3rd
> > > questions
> > > > are related. The problem with such support is the ambiguity when user
> > > > defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func
> ...".
> > > > Here "func" can means a global temp function, or a temp function in
> > > current
> > > > cat/db. If we can assume the former, this creates an inconsistency
> > > because
> > > > "CREATE FUNCTION func" actually means a function in current cat/db.
> If
> > we
> > > > assume the latter, then there is no way for user to create a global
> > temp
> > > > function.
> > > >
> > > > Giving a special namespace for built-in functions may solve the
> > ambiguity
> > > > problem above, but it also introduces artificial catalog/database
> that
> > > > needs special treatment and pollutes the cleanness of  the code. I
> > would
> > > > rather introduce a syntax in DDL to solve the problem, like "CREATE
> > > > [GLOBAL] TEMPORARY FUNCTION func".
> > > >
> > > > Thus, I'd like to summarize a few candidate proposals for voting
> > > purposes:
> > > >
> > > > 1. Support only global, temporary functions without namespace. Such
> > temp
> > > > functions overrides built-in functions and catalog functions in
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Xuefu Z

Re: The reason why I prefer option 3 is that in option 3 all objects
internally are identified with 3 parts.

True, but the problem we have is not about how to differentiate each type
objects internally. Rather, it's rather about how a user referencing an
object unambiguously and consistently.

Thanks,
Xuefu

On Wed, Sep 18, 2019 at 4:55 PM Dawid Wysakowicz 
wrote:

> Last additional comment on Option 2. The reason why I prefer option 3 is
> that in option 3 all objects internally are identified with 3 parts. This
> makes it easier to handle at different locations e.g. while persisting
> views, as all objects have uniform representation.
>
> On Thu, 19 Sep 2019, 07:31 Dawid Wysakowicz, 
> wrote:
>
> > Hi,
> > I think it makes sense to start voting at this point.
> >
> > Option 1: Only 1-part identifiers
> > PROS:
> > - allows shadowing built-in functions
> > CONS:
> > - incosistent with all the other objects, both permanent & temporary
> > - does not allow shadowing catalog functions
> >
> > Option 2: Special keyword for built-in function
> > I think this is quite similar to the special catalog/db. The thing I am
> > strongly against in this proposal is the GLOBAL keyword. This keyword
> has a
> > meaning in rdbms systems and means a function that is present for a
> > lifetime of a session in which it was created, but available in all other
> > sessions. Therefore I really don't want to use this keyword in a
> different
> > context.
> >
> > Option 3: Special catalog/db
> >
> > PROS:
> > - allows shadowing built-in functions
> > - allows shadowing catalog functions
> > - consistent with other objects
> > CONS:
> > - we introduce a special namespace for built-in functions
> >
> > I don't see a problem with introducing the special namespace. In the end
> > it is very similar to the keyword approach. In this case the catalog/db
> > combination would be the "keyword"
> >
> > Therefore my votes:
> > Option 1: -0
> > Option 2: -1 (I might change to +0 if we can come up with a better
> keyword)
> > Option 3: +1
> >
> > Best,
> > Dawid
> >
> >
> > On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:
> >
> >> Hi Aljoscha,
> >>
> >> Thanks for the summary and these are great questions to be answered. The
> >> answer to your first question is clear: there is a general agreement to
> >> override built-in functions with temp functions.
> >>
> >> However, your second and third questions are sort of related, as a
> >> function
> >> reference can be either just function name (like "func") or in the form
> or
> >> "cat.db.func". When a reference is just function name, it can mean
> either
> >> a
> >> built-in function or a function defined in the current cat/db. If we
> >> support overriding a built-in function with a temp function, such
> >> overriding can also cover a function in the current cat/db.
> >>
> >> I think what Timo referred as "overriding a catalog function" means a
> temp
> >> function defined as "cat.db.func" overrides a catalog function "func" in
> >> cat/db even if cat/db is not current. To support this, temp function has
> >> to
> >> be tied to a cat/db. What's why I said above that the 2nd and 3rd
> >> questions
> >> are related. The problem with such support is the ambiguity when user
> >> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> >> Here "func" can means a global temp function, or a temp function in
> >> current
> >> cat/db. If we can assume the former, this creates an inconsistency
> because
> >> "CREATE FUNCTION func" actually means a function in current cat/db. If
> we
> >> assume the latter, then there is no way for user to create a global temp
> >> function.
> >>
> >> Giving a special namespace for built-in functions may solve the
> ambiguity
> >> problem above, but it also introduces artificial catalog/database that
> >> needs special treatment and pollutes the cleanness of  the code. I would
> >> rather introduce a syntax in DDL to solve the problem, like "CREATE
> >> [GLOBAL] TEMPORARY FUNCTION func".
> >>
> >> Thus, I'd like to summarize a few candidate proposals for voting
> purposes:
> >>
> >> 1. Support only global, temporary functions without namespace. Such temp
> >> functions overrides built-in functions and catalog functions in current
> >> cat/db. The resolution order is: temp functions -> built-in functions ->
> >> catalog functions. (Partially or fully qualified functions has no
> >> ambiguity!)
> >>
> >> 2. In addition to #1, support creating and referencing temporary
> functions
> >> associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
> >> functions. The resolution order is: global temp functions -> built-in
> >> functions -> temp functions in current cat/db -> catalog function.
> >> (Resolution for partially or fully qualified function reference is: temp
> >> functions -> persistent functions.)
> >>
> >> 3. In addition to #1, support creating and referencing temporary
> functions
> >> associated with a cat/db with a special namespace for

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Dawid Wysakowicz

@Bowen I am not suggesting introducing additional catalog. I think we need
to get rid of the current built-in catalog.

@Xuefu in option #3 we also don't need additional referencing the special
catalog anywhere else besides in the CREATE statement. The resolution
behaviour is exactly the same in both options.

On Thu, 19 Sep 2019, 08:17 Xuefu Z,  wrote:

> Hi Dawid,
>
> "GLOBAL" is a temporary keyword that was given to the approach. It can be
> changed to something else for better.
>
> The difference between this and the #3 approach is that we only need the
> keyword for this create DDL. For other places (such as function
> referencing), no keyword or special namespace is needed.
>
> Thanks,
> Xuefu
>
> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> wysakowicz.da...@gmail.com>
> wrote:
>
> > Hi,
> > I think it makes sense to start voting at this point.
> >
> > Option 1: Only 1-part identifiers
> > PROS:
> > - allows shadowing built-in functions
> > CONS:
> > - incosistent with all the other objects, both permanent & temporary
> > - does not allow shadowing catalog functions
> >
> > Option 2: Special keyword for built-in function
> > I think this is quite similar to the special catalog/db. The thing I am
> > strongly against in this proposal is the GLOBAL keyword. This keyword
> has a
> > meaning in rdbms systems and means a function that is present for a
> > lifetime of a session in which it was created, but available in all other
> > sessions. Therefore I really don't want to use this keyword in a
> different
> > context.
> >
> > Option 3: Special catalog/db
> >
> > PROS:
> > - allows shadowing built-in functions
> > - allows shadowing catalog functions
> > - consistent with other objects
> > CONS:
> > - we introduce a special namespace for built-in functions
> >
> > I don't see a problem with introducing the special namespace. In the end
> it
> > is very similar to the keyword approach. In this case the catalog/db
> > combination would be the "keyword"
> >
> > Therefore my votes:
> > Option 1: -0
> > Option 2: -1 (I might change to +0 if we can come up with a better
> keyword)
> > Option 3: +1
> >
> > Best,
> > Dawid
> >
> >
> > On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:
> >
> > > Hi Aljoscha,
> > >
> > > Thanks for the summary and these are great questions to be answered.
> The
> > > answer to your first question is clear: there is a general agreement to
> > > override built-in functions with temp functions.
> > >
> > > However, your second and third questions are sort of related, as a
> > function
> > > reference can be either just function name (like "func") or in the form
> > or
> > > "cat.db.func". When a reference is just function name, it can mean
> > either a
> > > built-in function or a function defined in the current cat/db. If we
> > > support overriding a built-in function with a temp function, such
> > > overriding can also cover a function in the current cat/db.
> > >
> > > I think what Timo referred as "overriding a catalog function" means a
> > temp
> > > function defined as "cat.db.func" overrides a catalog function "func"
> in
> > > cat/db even if cat/db is not current. To support this, temp function
> has
> > to
> > > be tied to a cat/db. What's why I said above that the 2nd and 3rd
> > questions
> > > are related. The problem with such support is the ambiguity when user
> > > defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> > > Here "func" can means a global temp function, or a temp function in
> > current
> > > cat/db. If we can assume the former, this creates an inconsistency
> > because
> > > "CREATE FUNCTION func" actually means a function in current cat/db. If
> we
> > > assume the latter, then there is no way for user to create a global
> temp
> > > function.
> > >
> > > Giving a special namespace for built-in functions may solve the
> ambiguity
> > > problem above, but it also introduces artificial catalog/database that
> > > needs special treatment and pollutes the cleanness of  the code. I
> would
> > > rather introduce a syntax in DDL to solve the problem, like "CREATE
> > > [GLOBAL] TEMPORARY FUNCTION func".
> > >
> > > Thus, I'd like to summarize a few candidate proposals for voting
> > purposes:
> > >
> > > 1. Support only global, temporary functions without namespace. Such
> temp
> > > functions overrides built-in functions and catalog functions in current
> > > cat/db. The resolution order is: temp functions -> built-in functions
> ->
> > > catalog functions. (Partially or fully qualified functions has no
> > > ambiguity!)
> > >
> > > 2. In addition to #1, support creating and referencing temporary
> > functions
> > > associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
> > > functions. The resolution order is: global temp functions -> built-in
> > > functions -> temp functions in current cat/db -> catalog function.
> > > (Resolution for partially or fully qualified function reference is:
> temp
> > > functions ->

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Xuefu Z

Hi Dawid,

"GLOBAL" is a temporary keyword that was given to the approach. It can be
changed to something else for better.

The difference between this and the #3 approach is that we only need the
keyword for this create DDL. For other places (such as function
referencing), no keyword or special namespace is needed.

Thanks,
Xuefu

On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz 
wrote:

> Hi,
> I think it makes sense to start voting at this point.
>
> Option 1: Only 1-part identifiers
> PROS:
> - allows shadowing built-in functions
> CONS:
> - incosistent with all the other objects, both permanent & temporary
> - does not allow shadowing catalog functions
>
> Option 2: Special keyword for built-in function
> I think this is quite similar to the special catalog/db. The thing I am
> strongly against in this proposal is the GLOBAL keyword. This keyword has a
> meaning in rdbms systems and means a function that is present for a
> lifetime of a session in which it was created, but available in all other
> sessions. Therefore I really don't want to use this keyword in a different
> context.
>
> Option 3: Special catalog/db
>
> PROS:
> - allows shadowing built-in functions
> - allows shadowing catalog functions
> - consistent with other objects
> CONS:
> - we introduce a special namespace for built-in functions
>
> I don't see a problem with introducing the special namespace. In the end it
> is very similar to the keyword approach. In this case the catalog/db
> combination would be the "keyword"
>
> Therefore my votes:
> Option 1: -0
> Option 2: -1 (I might change to +0 if we can come up with a better keyword)
> Option 3: +1
>
> Best,
> Dawid
>
>
> On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:
>
> > Hi Aljoscha,
> >
> > Thanks for the summary and these are great questions to be answered. The
> > answer to your first question is clear: there is a general agreement to
> > override built-in functions with temp functions.
> >
> > However, your second and third questions are sort of related, as a
> function
> > reference can be either just function name (like "func") or in the form
> or
> > "cat.db.func". When a reference is just function name, it can mean
> either a
> > built-in function or a function defined in the current cat/db. If we
> > support overriding a built-in function with a temp function, such
> > overriding can also cover a function in the current cat/db.
> >
> > I think what Timo referred as "overriding a catalog function" means a
> temp
> > function defined as "cat.db.func" overrides a catalog function "func" in
> > cat/db even if cat/db is not current. To support this, temp function has
> to
> > be tied to a cat/db. What's why I said above that the 2nd and 3rd
> questions
> > are related. The problem with such support is the ambiguity when user
> > defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> > Here "func" can means a global temp function, or a temp function in
> current
> > cat/db. If we can assume the former, this creates an inconsistency
> because
> > "CREATE FUNCTION func" actually means a function in current cat/db. If we
> > assume the latter, then there is no way for user to create a global temp
> > function.
> >
> > Giving a special namespace for built-in functions may solve the ambiguity
> > problem above, but it also introduces artificial catalog/database that
> > needs special treatment and pollutes the cleanness of  the code. I would
> > rather introduce a syntax in DDL to solve the problem, like "CREATE
> > [GLOBAL] TEMPORARY FUNCTION func".
> >
> > Thus, I'd like to summarize a few candidate proposals for voting
> purposes:
> >
> > 1. Support only global, temporary functions without namespace. Such temp
> > functions overrides built-in functions and catalog functions in current
> > cat/db. The resolution order is: temp functions -> built-in functions ->
> > catalog functions. (Partially or fully qualified functions has no
> > ambiguity!)
> >
> > 2. In addition to #1, support creating and referencing temporary
> functions
> > associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
> > functions. The resolution order is: global temp functions -> built-in
> > functions -> temp functions in current cat/db -> catalog function.
> > (Resolution for partially or fully qualified function reference is: temp
> > functions -> persistent functions.)
> >
> > 3. In addition to #1, support creating and referencing temporary
> functions
> > associated with a cat/db with a special namespace for built-in functions
> > and global temp functions. The resolution is the same as #2, except that
> > the special namespace might be prefixed to a reference to a built-in
> > function or global temp function. (In absence of the special namespace,
> the
> > resolution order is the same as in #2.)
> >
> > My personal preference is #1, given the unknown use case and introduced
> > complexity for #2 and #3. However, #2 is an acceptable alternative. Thus,
> > my votes are:
>

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Bowen Li

Hi,

For #2, as Xuefu and I discussed offline, the key point is to introduce a
keyword to SQL DDL to distinguish temp function that override built-in
functions v.s. temp functions that override catalog functions. It can be
something else than "GLOBAL", like "BUILTIN" (e.g. "CREATE BUILTIN TEMP
FUNCTION") given there's no SQL standard for temp functions. This option is
more about whether we want to have such a new keyword in SQL for the
proposed functionality, and I personally think it's acceptable.

For #3, besides the drawbacks mentioned by Xuefu and Jark, another con is:
we already have a special catalog and db as "default_catalog" and
"default_database", though they are not used very correctly at the moment
(another story...), they are at least physically present. Introducing yet
another virtual special catalog/db as something like "system"."system" that
never physically exist would further confuse users, and
hurt understandability and usability.

Thus my vote is:

+1 for #1
0 for #2
-1 for #3

Thanks,
Bowen

On Wed, Sep 18, 2019 at 4:55 PM Dawid Wysakowicz 
wrote:

> Last additional comment on Option 2. The reason why I prefer option 3 is
> that in option 3 all objects internally are identified with 3 parts. This
> makes it easier to handle at different locations e.g. while persisting
> views, as all objects have uniform representation.
>
> On Thu, 19 Sep 2019, 07:31 Dawid Wysakowicz, 
> wrote:
>
> > Hi,
> > I think it makes sense to start voting at this point.
> >
> > Option 1: Only 1-part identifiers
> > PROS:
> > - allows shadowing built-in functions
> > CONS:
> > - incosistent with all the other objects, both permanent & temporary
> > - does not allow shadowing catalog functions
> >
> > Option 2: Special keyword for built-in function
> > I think this is quite similar to the special catalog/db. The thing I am
> > strongly against in this proposal is the GLOBAL keyword. This keyword
> has a
> > meaning in rdbms systems and means a function that is present for a
> > lifetime of a session in which it was created, but available in all other
> > sessions. Therefore I really don't want to use this keyword in a
> different
> > context.
> >
> > Option 3: Special catalog/db
> >
> > PROS:
> > - allows shadowing built-in functions
> > - allows shadowing catalog functions
> > - consistent with other objects
> > CONS:
> > - we introduce a special namespace for built-in functions
> >
> > I don't see a problem with introducing the special namespace. In the end
> > it is very similar to the keyword approach. In this case the catalog/db
> > combination would be the "keyword"
> >
> > Therefore my votes:
> > Option 1: -0
> > Option 2: -1 (I might change to +0 if we can come up with a better
> keyword)
> > Option 3: +1
> >
> > Best,
> > Dawid
> >
> >
> > On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:
> >
> >> Hi Aljoscha,
> >>
> >> Thanks for the summary and these are great questions to be answered. The
> >> answer to your first question is clear: there is a general agreement to
> >> override built-in functions with temp functions.
> >>
> >> However, your second and third questions are sort of related, as a
> >> function
> >> reference can be either just function name (like "func") or in the form
> or
> >> "cat.db.func". When a reference is just function name, it can mean
> either
> >> a
> >> built-in function or a function defined in the current cat/db. If we
> >> support overriding a built-in function with a temp function, such
> >> overriding can also cover a function in the current cat/db.
> >>
> >> I think what Timo referred as "overriding a catalog function" means a
> temp
> >> function defined as "cat.db.func" overrides a catalog function "func" in
> >> cat/db even if cat/db is not current. To support this, temp function has
> >> to
> >> be tied to a cat/db. What's why I said above that the 2nd and 3rd
> >> questions
> >> are related. The problem with such support is the ambiguity when user
> >> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> >> Here "func" can means a global temp function, or a temp function in
> >> current
> >> cat/db. If we can assume the former, this creates an inconsistency
> because
> >> "CREATE FUNCTION func" actually means a function in current cat/db. If
> we
> >> assume the latter, then there is no way for user to create a global temp
> >> function.
> >>
> >> Giving a special namespace for built-in functions may solve the
> ambiguity
> >> problem above, but it also introduces artificial catalog/database that
> >> needs special treatment and pollutes the cleanness of  the code. I would
> >> rather introduce a syntax in DDL to solve the problem, like "CREATE
> >> [GLOBAL] TEMPORARY FUNCTION func".
> >>
> >> Thus, I'd like to summarize a few candidate proposals for voting
> purposes:
> >>
> >> 1. Support only global, temporary functions without namespace. Such temp
> >> functions overrides built-in functions and catalog functions in current
> >> cat/db. The

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Dawid Wysakowicz

Last additional comment on Option 2. The reason why I prefer option 3 is
that in option 3 all objects internally are identified with 3 parts. This
makes it easier to handle at different locations e.g. while persisting
views, as all objects have uniform representation.

On Thu, 19 Sep 2019, 07:31 Dawid Wysakowicz, 
wrote:

> Hi,
> I think it makes sense to start voting at this point.
>
> Option 1: Only 1-part identifiers
> PROS:
> - allows shadowing built-in functions
> CONS:
> - incosistent with all the other objects, both permanent & temporary
> - does not allow shadowing catalog functions
>
> Option 2: Special keyword for built-in function
> I think this is quite similar to the special catalog/db. The thing I am
> strongly against in this proposal is the GLOBAL keyword. This keyword has a
> meaning in rdbms systems and means a function that is present for a
> lifetime of a session in which it was created, but available in all other
> sessions. Therefore I really don't want to use this keyword in a different
> context.
>
> Option 3: Special catalog/db
>
> PROS:
> - allows shadowing built-in functions
> - allows shadowing catalog functions
> - consistent with other objects
> CONS:
> - we introduce a special namespace for built-in functions
>
> I don't see a problem with introducing the special namespace. In the end
> it is very similar to the keyword approach. In this case the catalog/db
> combination would be the "keyword"
>
> Therefore my votes:
> Option 1: -0
> Option 2: -1 (I might change to +0 if we can come up with a better keyword)
> Option 3: +1
>
> Best,
> Dawid
>
>
> On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:
>
>> Hi Aljoscha,
>>
>> Thanks for the summary and these are great questions to be answered. The
>> answer to your first question is clear: there is a general agreement to
>> override built-in functions with temp functions.
>>
>> However, your second and third questions are sort of related, as a
>> function
>> reference can be either just function name (like "func") or in the form or
>> "cat.db.func". When a reference is just function name, it can mean either
>> a
>> built-in function or a function defined in the current cat/db. If we
>> support overriding a built-in function with a temp function, such
>> overriding can also cover a function in the current cat/db.
>>
>> I think what Timo referred as "overriding a catalog function" means a temp
>> function defined as "cat.db.func" overrides a catalog function "func" in
>> cat/db even if cat/db is not current. To support this, temp function has
>> to
>> be tied to a cat/db. What's why I said above that the 2nd and 3rd
>> questions
>> are related. The problem with such support is the ambiguity when user
>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
>> Here "func" can means a global temp function, or a temp function in
>> current
>> cat/db. If we can assume the former, this creates an inconsistency because
>> "CREATE FUNCTION func" actually means a function in current cat/db. If we
>> assume the latter, then there is no way for user to create a global temp
>> function.
>>
>> Giving a special namespace for built-in functions may solve the ambiguity
>> problem above, but it also introduces artificial catalog/database that
>> needs special treatment and pollutes the cleanness of  the code. I would
>> rather introduce a syntax in DDL to solve the problem, like "CREATE
>> [GLOBAL] TEMPORARY FUNCTION func".
>>
>> Thus, I'd like to summarize a few candidate proposals for voting purposes:
>>
>> 1. Support only global, temporary functions without namespace. Such temp
>> functions overrides built-in functions and catalog functions in current
>> cat/db. The resolution order is: temp functions -> built-in functions ->
>> catalog functions. (Partially or fully qualified functions has no
>> ambiguity!)
>>
>> 2. In addition to #1, support creating and referencing temporary functions
>> associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
>> functions. The resolution order is: global temp functions -> built-in
>> functions -> temp functions in current cat/db -> catalog function.
>> (Resolution for partially or fully qualified function reference is: temp
>> functions -> persistent functions.)
>>
>> 3. In addition to #1, support creating and referencing temporary functions
>> associated with a cat/db with a special namespace for built-in functions
>> and global temp functions. The resolution is the same as #2, except that
>> the special namespace might be prefixed to a reference to a built-in
>> function or global temp function. (In absence of the special namespace,
>> the
>> resolution order is the same as in #2.)
>>
>> My personal preference is #1, given the unknown use case and introduced
>> complexity for #2 and #3. However, #2 is an acceptable alternative. Thus,
>> my votes are:
>>
>> +1 for #1
>> +0 for #2
>> -1 for #3
>>
>> Everyone, please cast your vote (in above format please!), or let me know
>> if you

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Dawid Wysakowicz

Hi,
I think it makes sense to start voting at this point.

Option 1: Only 1-part identifiers
PROS:
- allows shadowing built-in functions
CONS:
- incosistent with all the other objects, both permanent & temporary
- does not allow shadowing catalog functions

Option 2: Special keyword for built-in function
I think this is quite similar to the special catalog/db. The thing I am
strongly against in this proposal is the GLOBAL keyword. This keyword has a
meaning in rdbms systems and means a function that is present for a
lifetime of a session in which it was created, but available in all other
sessions. Therefore I really don't want to use this keyword in a different
context.

Option 3: Special catalog/db

PROS:
- allows shadowing built-in functions
- allows shadowing catalog functions
- consistent with other objects
CONS:
- we introduce a special namespace for built-in functions

I don't see a problem with introducing the special namespace. In the end it
is very similar to the keyword approach. In this case the catalog/db
combination would be the "keyword"

Therefore my votes:
Option 1: -0
Option 2: -1 (I might change to +0 if we can come up with a better keyword)
Option 3: +1

Best,
Dawid


On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:

> Hi Aljoscha,
>
> Thanks for the summary and these are great questions to be answered. The
> answer to your first question is clear: there is a general agreement to
> override built-in functions with temp functions.
>
> However, your second and third questions are sort of related, as a function
> reference can be either just function name (like "func") or in the form or
> "cat.db.func". When a reference is just function name, it can mean either a
> built-in function or a function defined in the current cat/db. If we
> support overriding a built-in function with a temp function, such
> overriding can also cover a function in the current cat/db.
>
> I think what Timo referred as "overriding a catalog function" means a temp
> function defined as "cat.db.func" overrides a catalog function "func" in
> cat/db even if cat/db is not current. To support this, temp function has to
> be tied to a cat/db. What's why I said above that the 2nd and 3rd questions
> are related. The problem with such support is the ambiguity when user
> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> Here "func" can means a global temp function, or a temp function in current
> cat/db. If we can assume the former, this creates an inconsistency because
> "CREATE FUNCTION func" actually means a function in current cat/db. If we
> assume the latter, then there is no way for user to create a global temp
> function.
>
> Giving a special namespace for built-in functions may solve the ambiguity
> problem above, but it also introduces artificial catalog/database that
> needs special treatment and pollutes the cleanness of  the code. I would
> rather introduce a syntax in DDL to solve the problem, like "CREATE
> [GLOBAL] TEMPORARY FUNCTION func".
>
> Thus, I'd like to summarize a few candidate proposals for voting purposes:
>
> 1. Support only global, temporary functions without namespace. Such temp
> functions overrides built-in functions and catalog functions in current
> cat/db. The resolution order is: temp functions -> built-in functions ->
> catalog functions. (Partially or fully qualified functions has no
> ambiguity!)
>
> 2. In addition to #1, support creating and referencing temporary functions
> associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
> functions. The resolution order is: global temp functions -> built-in
> functions -> temp functions in current cat/db -> catalog function.
> (Resolution for partially or fully qualified function reference is: temp
> functions -> persistent functions.)
>
> 3. In addition to #1, support creating and referencing temporary functions
> associated with a cat/db with a special namespace for built-in functions
> and global temp functions. The resolution is the same as #2, except that
> the special namespace might be prefixed to a reference to a built-in
> function or global temp function. (In absence of the special namespace, the
> resolution order is the same as in #2.)
>
> My personal preference is #1, given the unknown use case and introduced
> complexity for #2 and #3. However, #2 is an acceptable alternative. Thus,
> my votes are:
>
> +1 for #1
> +0 for #2
> -1 for #3
>
> Everyone, please cast your vote (in above format please!), or let me know
> if you have more questions or other candidates.
>
> Thanks,
> Xuefu
>
>
>
>
>
>
>
> On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek 
> wrote:
>
> > Hi,
> >
> > I think this discussion and the one for FLIP-64 are very connected. To
> > resolve the differences, think we have to think about the basic
> principles
> > and find consensus there. The basic questions I see are:
> >
> >  - Do we want to support overriding builtin functions?
> >  - Do we want to support overriding catalog

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Xuefu Z

Hi Aljoscha,

Thanks for the summary and these are great questions to be answered. The
answer to your first question is clear: there is a general agreement to
override built-in functions with temp functions.

However, your second and third questions are sort of related, as a function
reference can be either just function name (like "func") or in the form or
"cat.db.func". When a reference is just function name, it can mean either a
built-in function or a function defined in the current cat/db. If we
support overriding a built-in function with a temp function, such
overriding can also cover a function in the current cat/db.

I think what Timo referred as "overriding a catalog function" means a temp
function defined as "cat.db.func" overrides a catalog function "func" in
cat/db even if cat/db is not current. To support this, temp function has to
be tied to a cat/db. What's why I said above that the 2nd and 3rd questions
are related. The problem with such support is the ambiguity when user
defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
Here "func" can means a global temp function, or a temp function in current
cat/db. If we can assume the former, this creates an inconsistency because
"CREATE FUNCTION func" actually means a function in current cat/db. If we
assume the latter, then there is no way for user to create a global temp
function.

Giving a special namespace for built-in functions may solve the ambiguity
problem above, but it also introduces artificial catalog/database that
needs special treatment and pollutes the cleanness of  the code. I would
rather introduce a syntax in DDL to solve the problem, like "CREATE
[GLOBAL] TEMPORARY FUNCTION func".

Thus, I'd like to summarize a few candidate proposals for voting purposes:

1. Support only global, temporary functions without namespace. Such temp
functions overrides built-in functions and catalog functions in current
cat/db. The resolution order is: temp functions -> built-in functions ->
catalog functions. (Partially or fully qualified functions has no
ambiguity!)

2. In addition to #1, support creating and referencing temporary functions
associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
functions. The resolution order is: global temp functions -> built-in
functions -> temp functions in current cat/db -> catalog function.
(Resolution for partially or fully qualified function reference is: temp
functions -> persistent functions.)

3. In addition to #1, support creating and referencing temporary functions
associated with a cat/db with a special namespace for built-in functions
and global temp functions. The resolution is the same as #2, except that
the special namespace might be prefixed to a reference to a built-in
function or global temp function. (In absence of the special namespace, the
resolution order is the same as in #2.)

My personal preference is #1, given the unknown use case and introduced
complexity for #2 and #3. However, #2 is an acceptable alternative. Thus,
my votes are:

+1 for #1
+0 for #2
-1 for #3

Everyone, please cast your vote (in above format please!), or let me know
if you have more questions or other candidates.

Thanks,
Xuefu

On Wed, Sep 18, 2019 at 6:42 AM Aljoscha Krettek 
wrote:

> Hi,
>
> I think this discussion and the one for FLIP-64 are very connected. To
> resolve the differences, think we have to think about the basic principles
> and find consensus there. The basic questions I see are:
>
>  - Do we want to support overriding builtin functions?
>  - Do we want to support overriding catalog functions?
>  - And then later: should temporary functions be tied to a
> catalog/database?
>
> I don’t have much to say about these, except that we should somewhat stick
> to what the industry does. But I also understand that the industry is
> already very divided on this.
>
> Best,
> Aljoscha
>
> > On 18. Sep 2019, at 11:41, Jark Wu  wrote:
> >
> > Hi,
> >
> > +1 to strive for reaching consensus on the remaining topics. We are
> close to the truth. It will waste a lot of time if we resume the topic some
> time later.
> >
> > +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way
> to override a catalog function.
> >
> > I’m not sure about “system.system.fun”, it introduces a nonexistent cat
> & db? And we still need to do special treatment for the dedicated
> system.system cat & db?
> >
> > Best,
> > Jark
> >
> >
> >> 在 2019年9月18日，06:54，Timo Walther  写道：
> >>
> >> Hi everyone,
> >>
> >> @Xuefu: I would like to avoid adding too many things incrementally.
> Users should be able to override all catalog objects consistently according
> to FLIP-64 (Support for Temporary Objects in Table module). If functions
> are treated completely different, we need more code and special cases. From
> an implementation perspective, this topic only affects the lookup logic
> which is rather low implementation effort which is why I would like to
> clarify the remaining items. As you

[jira] [Created] (FLINK-14121) upgrade tocommons-compress:1.19 due to CVE

2019-09-18 Thread John Lonergan (Jira)

John Lonergan created FLINK-14121:
-

 Summary: upgrade tocommons-compress:1.19 due to CVE
 Key: FLINK-14121
 URL: https://issues.apache.org/jira/browse/FLINK-14121
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Release System
Affects Versions: 1.9.0
Reporter: John Lonergan


See 
https://commons.apache.org/proper/commons-compress/security-reports.html




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Contribute Pulsar Flink connector back to Flink

2019-09-18 Thread Rong Rong

Hi Yijie,

Thanks for sharing the pulsar FLIP.
Would you mind enabling comments/suggestions on the google doc link? This
way the contributors from the community can comment on the doc.

Best,
Rong

On Mon, Sep 16, 2019 at 5:43 AM Yijie Shen 
wrote:

> Hello everyone,
>
> I've drafted a FLIP that describes the current design of the Pulsar
> connector:
>
>
> https://docs.google.com/document/d/1rES79eKhkJxrRfQp1b3u8LB2aPaq-6JaDHDPJIA8kMY/edit#
>
> Please take a look and let me know what you think.
>
> Thanks,
> Yijie
>
> On Sat, Sep 14, 2019 at 12:08 AM Rong Rong  wrote:
> >
> > Hi All,
> >
> > Sorry for joining the discussion late and thanks Yijie & Sijie for
> driving
> > the discussion.
> > I also think the Pulsar connector would be a very valuable addition to
> > Flink. I can also help out a bit on the review side :-)
> >
> > Regarding the timeline, I also share concerns with Becket on the
> > relationship between the new Pulsar connector and FLIP-27.
> > There's also another discussion just started by Stephan on dropping Kafka
> > 9/10 support on next Flink release [1].  Although the situation is
> somewhat
> > different, and Kafka 9/10 connector has been in Flink for almost 3-4
> years,
> > based on the discussion I am not sure if a major version release is a
> > requirement for removing old connector supports.
> >
> > I think there shouldn't be a blocker if we agree the old connector will
> be
> > removed once FLIP-27 based Pulsar connector is there. As Stephan stated,
> it
> > is easier to contribute the source sooner and adjust it later.
> > We should also ensure we clearly communicate the message: for example,
> > putting an experimental flag on the pre-FLIP27 connector page of the
> > website, documentations, etc. Any other thoughts?
> >
> > --
> > Rong
> >
> > [1]
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Drop-older-versions-of-Kafka-Connectors-0-9-0-10-for-Flink-1-10-td29916.html
> >
> >
> > On Fri, Sep 13, 2019 at 8:15 AM Becket Qin  wrote:
> >
> > > Technically speaking, removing the old connector code is a backwards
> > > incompatible change which requires a major version bump, i.e. Flink
> 2.x.
> > > Given that we don't have a clear plan on when to have the next major
> > > version release, it seems unclear how long the old connector code will
> be
> > > there if we check it in right now. Or will we remove the old connector
> > > without a major version bump? In any case, it sounds not quite user
> > > friendly to the those who might use the old Pulsar connector. I am not
> sure
> > > if it is worth these potential problems in order to have the Pulsar
> source
> > > connector checked in one or two months earlier.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Thu, Sep 12, 2019 at 3:52 PM Stephan Ewen  wrote:
> > >
> > > > Agreed, if we check in the old code, we should make it clear that it
> will
> > > > be removed as soon as the FLIP-27 based version of the connector is
> > > there.
> > > > We should not commit to maintaining the old version, that would be
> indeed
> > > > too much overhead.
> > > >
> > > > On Thu, Sep 12, 2019 at 3:30 AM Becket Qin 
> wrote:
> > > >
> > > > > Hi Stephan,
> > > > >
> > > > > Thanks for the volunteering to help.
> > > > >
> > > > > Yes, the overhead would just be review capacity. In fact, I am not
> > > > worrying
> > > > > too much about the review capacity. That is just a one time cost.
> My
> > > > > concern is mainly about the long term burden. Assume we have new
> source
> > > > > interface ready in 1.10 with newly added Pulsar connectors in old
> > > > > interface. Later on if we migrate Pulsar to new source interface,
> the
> > > old
> > > > > Pulsar connector might be deprecated almost immediately after
> checked
> > > in,
> > > > > but we may still have to maintain two code bases. For the existing
> > > > > connectors, we have to do that anyways. But it would be good to
> avoid
> > > > > introducing a new connector with the same problem.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Tue, Sep 10, 2019 at 6:51 PM Stephan Ewen 
> wrote:
> > > > >
> > > > > > Hi all!
> > > > > >
> > > > > > Nice to see this lively discussion about the Pulsar connector.
> > > > > > Some thoughts on the open questions:
> > > > > >
> > > > > > ## Contribute to Flink or maintain as a community package
> > > > > >
> > > > > > Looks like the discussion is more going towards contribution. I
> think
> > > > > that
> > > > > > is good, especially if we think that we want to build a similarly
> > > deep
> > > > > > integration with Pulsar as we have for example with Kafka. The
> > > > connector
> > > > > > already looks like a more thorough connector than many others we
> have
> > > > in
> > > > > > the repository.
> > > > > >
> > > > > > With either a repo split, or the new build system, I hope that
> the
> > > > build
> > > > > > overhead is not a problem.
> > > > > >
>

Re: [DISCUSS] FLIP-56: Dynamic Slot Allocation

2019-09-18 Thread tao xiao

That makes sense. I suggest we add one note to the KIP to avoid confusion

On Wed, Sep 18, 2019 at 9:51 AM Xintong Song  wrote:

> @tao
>
> I think we cannot limit the cpu usage of a slot, nor isolate the usages
> between slots. We do have cpu limits for the task executor in some
> scenarios, such as on yarn with strict cgroup mode.
>
> The purpose of bookkeep and dynamic allocation of cpu cores is to prevent
> scheduling tasks with too many computation loads to the task executor,
> rather than limit the cpu usage of each slot.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Sep 18, 2019 at 12:18 AM tao xiao  wrote:
>
> > Sorry if I ask a question that has been addressed before. please point me
> > to the reference.
> >
> > How do we limit the cpu usage to a slot?  Does the thread that executes
> the
> > slot get paused when it uses CPU cycles more than it requests?
> >
> > On Tue, Sep 17, 2019 at 10:23 PM Xintong Song 
> > wrote:
> >
> > > Thanks for the feedback, Andrey.
> > >
> > > I'll start the vote.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Sep 17, 2019 at 10:09 PM Andrey Zagrebin  >
> > > wrote:
> > >
> > > > Thanks for the update @Xintong.
> > > > I would be ok with starting the vote.
> > > >
> > > > Best,
> > > > Andrey
> > > >
> > > > On Tue, Sep 17, 2019 at 6:12 AM Xintong Song 
> > > > wrote:
> > > >
> > > > > The implementation plan [1] is updated, with the following changes:
> > > > >
> > > > >- Add default slot resource profile to
> > > > >ResourceManagerGateway#registerTaskExecutor rather than
> > > > #sendSlotReport.
> > > > >- Swap 'TaskExecutor derive and register with default slot
> > resource
> > > > >profile' and 'Extend TaskExecutor to support dynamic slot
> > > allocation'
> > > > >- Add step for updating RestAPI / Web UI
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > >
> > > > > On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <
> tonysong...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > @Till
> > > > > > Thanks for the reminding. I'll add a step for updating the web
> ui.
> > > I'll
> > > > > > try to involve Lining to help us with this step.
> > > > > >
> > > > > > @Andrey
> > > > > > I was thinking that after we define the RM-TM interfaces in step
> 2,
> > > it
> > > > > > would be good to concurrently work on both RM and TM side. But
> yes,
> > > if
> > > > we
> > > > > > finish Step 4 early, then it would make step 6 easier. We can
> start
> > > to
> > > > > have
> > > > > > some IT/E2E tests, with the default slot resource profiles being
> > > > > available.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin <
> > > and...@ververica.com>
> > > > > > wrote:
> > > > > >
> > > > > >> @Xintong
> > > > > >>
> > > > > >> Thanks for the feedback.
> > > > > >>
> > > > > >> Just to clarify step 6:
> > > > > >> If the first point is done before step 5 (e.g. as part of 4)
> then
> > it
> > > > is
> > > > > >> just keeping the info about the default slot in RM's data
> > structure
> > > > > >> associated the TM and no real change in the behaviour.
> > > > > >> When this info is available, I think it can be straightforwardly
> > > used
> > > > > >> during step 5 where we get either concrete slot requirement
> > > > > >> or the unknown one (step 6, point 2) which simply grabs some of
> > the
> > > > > >> concrete default ones (btw not clear which one, seems just some
> > > > random?)
> > > > > >>
> > > > > >> For steps 5,7, true, it is not quite clear whether we can avoid
> > some
> > > > > >> split,
> > > > > >> e.g. after step 5 before doing step 7.
> > > > > >> I agree that we should introduce the feature flag if we clearly
> > see
> > > > that
> > > > > >> it
> > > > > >> would be a bigger effort without the flag.
> > > > > >>
> > > > > >> Best,
> > > > > >> Andrey
> > > > > >>
> > > > > >> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann <
> > trohrm...@apache.org
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > One thing which was briefly mentioned in the Flip but not in
> the
> > > > > >> > implementation plan is the update of the web UI. I think it is
> > > worth
> > > > > >> > putting an extra item for updating the web UI to properly
> > display
> > > > the
> > > > > >> > resources a TM has still to offer with dynamic slot
> allocation.
> > I
> > > > > guess
> > > > > >> we
> > > > > >> > need to pull in some JavaScript help in order to implement
> this
> > > > step.
> > > > > >> >
> > > > > >> > Cheers,
> > > > > >> > Till
> > > > > >> >
> > > > > >> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song <
> > > tonysong...@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Thanks for the comments,

[jira] [Created] (FLINK-14120) SystemProcessingTimeServiceTest.testImmediateShutdown failed on Travis

2019-09-18 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-14120:
-

 Summary: SystemProcessingTimeServiceTest.testImmediateShutdown 
failed on Travis
 Key: FLINK-14120
 URL: https://issues.apache.org/jira/browse/FLINK-14120
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.10.0
Reporter: Till Rohrmann


The test {{SystemProcessingTimeServiceTest.testImmediateShutdown}} failed on 
Travis with

{code}
java.lang.AssertionError: 

Expected: is null
 but: was 
at 
org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest.testImmediateShutdown(SystemProcessingTimeServiceTest.java:196)
{code}

https://api.travis-ci.org/v3/job/586514264/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-18 Thread Till Rohrmann

No reason to keep the separation. The NewClusterClient interface was only
introduced to add new methods and not having to implement them for the
other ClusterClient implementations.

Cheers,
Till

On Wed, Sep 18, 2019 at 3:17 PM Aljoscha Krettek 
wrote:

> I agree that NewClusterClient and ClusterClient can be merged now that
> there is no pre-FLIP-6 code base anymore.
>
> Side note, there are a lot of methods in ClusterClient that should not
> really be there, in my opinion:
>  - all the getOptimizedPlan*() method
>  - the run() methods. In the end, only submitJob should be required
>
> We should also see what Till (cc’ed) says, maybe he has an opinion on why
> the separation should be kept.
>
> Best,
> Aljoscha
>
> > On 18. Sep 2019, at 11:54, Zili Chen  wrote:
> >
> > Hi Xiaogang,
> >
> > Thanks for your reply.
> >
> > According to the feature discussion thread[1] client API enhancement is a
> > planned
> > feature towards 1.10 and thus I think this thread is valid if we can
> reach
> > a consensus
> > and introduce new client API in this development cycle.
> >
> > Best,
> > tison.
> >
> > [1]
> >
> https://lists.apache.org/thread.html/22639ca7de62a18f50e90db53e73910bd99b7f00c82f7494f4cb035f@%3Cdev.flink.apache.org%3E
> >
> >
> > SHI Xiaogang  于2019年9月18日周三 下午3:03写道：
> >
> >> Hi Tison,
> >>
> >> Thanks for bringing this.
> >>
> >> I think it's fine to break the back compatibility of client API now that
> >> ClusterClient is not well designed for public usage.
> >> But from my perspective, we should postpone any modification to existing
> >> interfaces until we come to an agreement on new client API. Otherwise,
> our
> >> users may adapt their implementation more than once.
> >>
> >> Regards,
> >> Xiaogang
> >>
> >> Jeff Zhang  于2019年9月18日周三 上午10:49写道：
> >>
> >>> Thanks for raising this discussion. Overall +1 to merge
> NewClusterClient
> >>> into ClusterClient.
> >>>
> >>> 1. I think it is OK to break the backward compatibility. This current
> >>> client api is no so clean which already cause issue for downstream
> >> project
> >>> and flink itself.
> >>> In flink scala shell, I notice this kind of non-readable code
> >>> Option[Either
> >>> [MiniCluster , ClusterClient[_]]])
> >>>
> >>>
> >>
> https://github.com/apache/flink/blob/master/flink-scala-shell/src/main/scala/org/apache/flink/api/scala/FlinkShell.scala#L138
> >>> I also created tickets and PR to try to simply it.
> >>> https://github.com/apache/flink/pull/8546
> >>> https://github.com/apache/flink/pull/8533
> >>>   Personally I don't think we need to keep backward compatibility for
> >>> non-well-designed api, otherwise it will bring lots of unnecessary
> >>> overhead.
> >>>
> >>> 2. Another concern is that I notice there're many implementation
> details
> >> in
> >>> ClusterClient. I think we should just expose a thin interface, so maybe
> >> we
> >>> can create interface ClusterClient which includes as less methods as
> >>> possible, and move all the implementation to AbstractClusterClient.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Zili Chen  于2019年9月18日周三 上午9:46写道：
> >>>
>  Hi devs,
> 
>  FLINK-14096[1] was created yesterday. It is aimed at merge the bridge
>  class NewClusterClient into ClusterClient because with the effort
>  under FLINK-10392 this bridge class is no longer necessary.
> 
>  Technically in current codebase all implementation of interface
>  NewClusterClient is subclass of ClusterClient so that the work
>  required is no more than move method declaration. It helps we use
>  type signature ClusterClient instead of
>    latter if we aren't in a type variable context. This should not affect
>  anything internal in Flink scope.
> 
>  However, as mentioned by Kostas in the JIRA and a previous discussion
>  under a commit[2], it seems that we provide some levels of backward
>  compatibility for ClusterClient and thus it's better to start a public
>  discussion here.
> 
>  There are two concerns from my side.
> 
>  1. How much impact this proposal brings to users programming directly
>  to ClusterClient?
> 
>  The specific changes here are add two methods `submitJob` and
>  `requestJobResult` which are already implemented by RestClusterClient
>  and MiniClusterClient. Users would only be affected if they create
>  a class that inherits ClusterClient and doesn't implement these
>  methods. Besides, users who create a class that implements
>  NewClusterClient would be affected by the removal of NewClusterClient.
> 
>  If we have to provide backward compatibility and the impact is no
>  further than those above, we can deprecate NewClusterClient, merge
>  the methods into ClusterClient with a dummy default like throw
>  Exception.
> 
>  2. Why do we provide backward compatibility for ClusterClient?
> 
>  It already surprises Kostas and me while we think ClusterClient is a

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Aljoscha Krettek

Hi,

I think this discussion and the one for FLIP-64 are very connected. To resolve 
the differences, think we have to think about the basic principles and find 
consensus there. The basic questions I see are:

 - Do we want to support overriding builtin functions?
 - Do we want to support overriding catalog functions?
 - And then later: should temporary functions be tied to a catalog/database?

I don’t have much to say about these, except that we should somewhat stick to 
what the industry does. But I also understand that the industry is already very 
divided on this.

Best,
Aljoscha

> On 18. Sep 2019, at 11:41, Jark Wu  wrote:
> 
> Hi,
> 
> +1 to strive for reaching consensus on the remaining topics. We are close to 
> the truth. It will waste a lot of time if we resume the topic some time 
> later. 
> 
> +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way to 
> override a catalog function. 
> 
> I’m not sure about “system.system.fun”, it introduces a nonexistent cat & db? 
> And we still need to do special treatment for the dedicated system.system cat 
> & db? 
> 
> Best,
> Jark
> 
> 
>> 在 2019年9月18日，06:54，Timo Walther  写道：
>> 
>> Hi everyone,
>> 
>> @Xuefu: I would like to avoid adding too many things incrementally. Users 
>> should be able to override all catalog objects consistently according to 
>> FLIP-64 (Support for Temporary Objects in Table module). If functions are 
>> treated completely different, we need more code and special cases. From an 
>> implementation perspective, this topic only affects the lookup logic which 
>> is rather low implementation effort which is why I would like to clarify the 
>> remaining items. As you said, we have a slight consenus on overriding 
>> built-in functions; we should also strive for reaching consensus on the 
>> remaining topics.
>> 
>> @Dawid: I like your idea as it ensures registering catalog objects 
>> consistent and the overriding of built-in functions more explicit.
>> 
>> Thanks,
>> Timo
>> 
>> 
>> On 17.09.19 11:59, kai wang wrote:
>>> hi, everyone
>>> I think this flip is very meaningful. it supports functions that can be
>>> shared by different catalogs and dbs, reducing the duplication of functions.
>>> 
>>> Our group based on flink's sql parser module implements create function
>>> feature, stores the parsed function metadata and schema into mysql, and
>>> also customizes the catalog, customizes sql-client to support custom
>>> schemas and functions. Loaded, but the function is currently global, and is
>>> not subdivided according to catalog and db.
>>> 
>>> In addition, I very much hope to participate in the development of this
>>> flip, I have been paying attention to the community, but found it is more
>>> difficult to join.
>>> thank you.
>>> 
>>> Xuefu Z  于2019年9月17日周二 上午11:19写道：
>>> 
 Thanks to Tmo and Dawid for sharing thoughts.
 
 It seems to me that there is a general consensus on having temp functions
 that have no namespaces and overwrite built-in functions. (As a side note
 for comparability, the current user defined functions are all temporary and
 having no namespaces.)
 
 Nevertheless, I can also see the merit of having namespaced temp functions
 that can overwrite functions defined in a specific cat/db. However,  this
 idea appears orthogonal to the former and can be added incrementally.
 
 How about we first implement non-namespaced temp functions now and leave
 the door open for namespaced ones for later releases as the requirement
 might become more crystal? This also helps shorten the debate and allow us
 to make some progress along this direction.
 
 As to Dawid's idea of having a dedicated cat/db to host the temporary temp
 functions that don't have namespaces, my only concern is the special
 treatment for a cat/db, which makes code less clean, as evident in treating
 the built-in catalog currently.
 
 Thanks,
 Xuefiu
 
 On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
 wysakowicz.da...@gmail.com>
 wrote:
 
> Hi,
> Another idea to consider on top of Timo's suggestion. How about we have a
> special namespace (catalog + database) for built-in objects? This catalog
> would be invisible for users as Xuefu was suggesting.
> 
> Then users could still override built-in functions, if they fully qualify
> object with the built-in namespace, but by default the common logic of
> current dB & cat would be used.
> 
> CREATE TEMPORARY FUNCTION func ...
> registers temporary function in current cat & dB
> 
> CREATE TEMPORARY FUNCTION cat.db.func ...
> registers temporary function in cat db
> 
> CREATE TEMPORARY FUNCTION system.system.func ...
> Overrides built-in function with temporary function
> 
> The built-in/system namespace would not be writable for permanent
 objects.
> WDYT?
> 
> This way I think we can

[jira] [Created] (FLINK-14119) Clean idle state for RetractableTopNFunction

2019-09-18 Thread Jark Wu (Jira)

Jark Wu created FLINK-14119:
---

 Summary: Clean idle state for RetractableTopNFunction
 Key: FLINK-14119
 URL: https://issues.apache.org/jira/browse/FLINK-14119
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Jark Wu


We cleaned the idle state for AppendOnlyTopNFunction and UpdatableTopNFunction, 
but missed this thing for RetractableTopNFunction. We should add it to avoid 
unlimited state size. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-18 Thread Aljoscha Krettek

I agree that NewClusterClient and ClusterClient can be merged now that there is 
no pre-FLIP-6 code base anymore.

Side note, there are a lot of methods in ClusterClient that should not really 
be there, in my opinion:
 - all the getOptimizedPlan*() method
 - the run() methods. In the end, only submitJob should be required

We should also see what Till (cc’ed) says, maybe he has an opinion on why the 
separation should be kept.

Best,
Aljoscha

> On 18. Sep 2019, at 11:54, Zili Chen  wrote:
> 
> Hi Xiaogang,
> 
> Thanks for your reply.
> 
> According to the feature discussion thread[1] client API enhancement is a
> planned
> feature towards 1.10 and thus I think this thread is valid if we can reach
> a consensus
> and introduce new client API in this development cycle.
> 
> Best,
> tison.
> 
> [1]
> https://lists.apache.org/thread.html/22639ca7de62a18f50e90db53e73910bd99b7f00c82f7494f4cb035f@%3Cdev.flink.apache.org%3E
> 
> 
> SHI Xiaogang  于2019年9月18日周三 下午3:03写道：
> 
>> Hi Tison,
>> 
>> Thanks for bringing this.
>> 
>> I think it's fine to break the back compatibility of client API now that
>> ClusterClient is not well designed for public usage.
>> But from my perspective, we should postpone any modification to existing
>> interfaces until we come to an agreement on new client API. Otherwise, our
>> users may adapt their implementation more than once.
>> 
>> Regards,
>> Xiaogang
>> 
>> Jeff Zhang  于2019年9月18日周三 上午10:49写道：
>> 
>>> Thanks for raising this discussion. Overall +1 to merge NewClusterClient
>>> into ClusterClient.
>>> 
>>> 1. I think it is OK to break the backward compatibility. This current
>>> client api is no so clean which already cause issue for downstream
>> project
>>> and flink itself.
>>> In flink scala shell, I notice this kind of non-readable code
>>> Option[Either
>>> [MiniCluster , ClusterClient[_]]])
>>> 
>>> 
>> https://github.com/apache/flink/blob/master/flink-scala-shell/src/main/scala/org/apache/flink/api/scala/FlinkShell.scala#L138
>>> I also created tickets and PR to try to simply it.
>>> https://github.com/apache/flink/pull/8546
>>> https://github.com/apache/flink/pull/8533
>>>   Personally I don't think we need to keep backward compatibility for
>>> non-well-designed api, otherwise it will bring lots of unnecessary
>>> overhead.
>>> 
>>> 2. Another concern is that I notice there're many implementation details
>> in
>>> ClusterClient. I think we should just expose a thin interface, so maybe
>> we
>>> can create interface ClusterClient which includes as less methods as
>>> possible, and move all the implementation to AbstractClusterClient.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Zili Chen  于2019年9月18日周三 上午9:46写道：
>>> 
 Hi devs,

 FLINK-14096[1] was created yesterday. It is aimed at merge the bridge
 class NewClusterClient into ClusterClient because with the effort
 under FLINK-10392 this bridge class is no longer necessary.

 Technically in current codebase all implementation of interface
 NewClusterClient is subclass of ClusterClient so that the work
 required is no more than move method declaration. It helps we use
 type signature ClusterClient instead of
 >>> latter if we aren't in a type variable context. This should not affect
 anything internal in Flink scope.

 However, as mentioned by Kostas in the JIRA and a previous discussion
 under a commit[2], it seems that we provide some levels of backward
 compatibility for ClusterClient and thus it's better to start a public
 discussion here.

 There are two concerns from my side.

 1. How much impact this proposal brings to users programming directly
 to ClusterClient?

 The specific changes here are add two methods `submitJob` and
 `requestJobResult` which are already implemented by RestClusterClient
 and MiniClusterClient. Users would only be affected if they create
 a class that inherits ClusterClient and doesn't implement these
 methods. Besides, users who create a class that implements
 NewClusterClient would be affected by the removal of NewClusterClient.

 If we have to provide backward compatibility and the impact is no
 further than those above, we can deprecate NewClusterClient, merge
 the methods into ClusterClient with a dummy default like throw
 Exception.

 2. Why do we provide backward compatibility for ClusterClient?

 It already surprises Kostas and me while we think ClusterClient is a
 totally internal class which we can evolve regardless of api
 stability. Our community promises api stability by marking class
 and/or method as @Public/@PublicEvolving. It is wried and even
 dangerous we are somehow enforced to provide backward compatibility
 for classes without any annotation.

 Besides, as I mention in [2], users who anyway want to program
 directly to internal classes/interfaces are considered to prepare to

Re: Retention policy | Memory management.

2019-09-18 Thread Jark Wu

Hi,

The Job1 is a simple ETL job and doesn’t consume much state size (only Kafka 
offset), so it should work well. 
The Job2 is an unbounded join which will put the two input stream data into 
state in Join operator. 
As the input stream is unlimited and 100GB per day as you described, if you are 
using Memory statebackend (which is the default one). 
Then the job will OOM at the end. 

Here are my answers:
>  1. How long does the data reside in my table once I read it? I consume
  100GB per day, should have been a retention policy right? If so, where do I
  configure and how?

The data is stored in state. You can specify the retention policy by setting
 “execution: min-idle-state-retention” and execution: max-idle-retention: “ 
keys[1]
 in environment file if you are using SQL CLI. 

>  2. Are retention policies specific to tables?

No. It affects to all the stateble non-window operations (e.g. GroupBy, Join)

>   3. I have a data set updates once a day. How about using UPSERT mode?
  If so, how could I delete the existing data set to load the new?

Flink SQL doesn’t support to load periodic-changed data set yet. Maybe you can 
achieve this by implementing custom source and operators in DataStream API.

Best,
Jark


[1]: 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sqlClient.html#environment-files


> 在 2019年9月13日，15:43，srikanth flink  写道：
> 
> Hi there,
> 
> I came across Flink and FlinkSQL and using FlinkSQL for stream processing.
> Flink runs as 3 node cluster with embedded Zookeeper, given heap 80GB on
> each. I came across few issues and would like to get some clarification.
> 
>   - Job1: Using Flink(java) to read and flatten my JSON and write to Kafka
>   topic.
> 
> 
>   - Job2: Environment file configured to read from 2 different Kafka
>   topics. I get to join both the tables and are working. The query runs for a
>   while (say an hour) and then fails with *error*.
> 
> Questions:
> 
>   1. How long does the data reside in my table once I read it? I consume
>   100GB per day, should have been a retention policy right? If so, where do I
>   configure and how?
>   2. Are retention policies specific to tables?
>   3. I have a data set updates once a day. How about using UPSERT mode?
>   If so, how could I delete the existing data set to load the new?
> 
> 
> *Query*: SELECT s.* from sourceKafka AS s INNER JOIN badIp AS b ON
> s.`source.ip`=b.ip;
> *Error*: org.apache.flink.util.FlinkException: The assigned slot
> e57d1c0556b4a197eb44d7d9e83e1a47_6 was removed. at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.removeSlot(SlotManagerImpl.java:958)
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.removeSlots(SlotManagerImpl.java:928)
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.internalUnregisterTaskManager(SlotManagerImpl.java:1149)
> 
> *Environment File*:
> #==
> # Tables
> #==
> 
> # Define tables here such as sources, sinks, views, or temporal tables.
> tables:  # empty list
> # A typical table source definition looks like:
>  - name: sourceKafka
>type: source-table
>update-mode: append
>connector:
>  type: kafka
>  version: "universal" # required: valid connector versions are
>  #   "0.8", "0.9", "0.10", "0.11", and "universal"
>  topic: recon-data-flatten  # required: topic name from which
> the table is read
> 
>  properties: # optional: connector specific properties
>- key: zookeeper.connect
>  value: 1.2.4.1:2181
>- key: bootstrap.servers
>  value: 1.2.4.1:9092
>- key: group.id
>  value: reconDataGroup
>format:
>  type: json
>  fail-on-missing-field: false
>  json-schema: >
>{
>  type: 'object',
>  properties: {
>'source.ip': {
>  type: 'string'
>},
>'source.port': {
>  type: 'string'
>},
>'destination.ip': {
>  type: 'string'
>},
>'destination.port': {
>  type: 'string'
>}
>  }
>}
>  derive-schema: false
> 
>schema:
>  - name: 'source.ip'
>type: VARCHAR
>  - name: 'source.port'
>type: VARCHAR
>  - name: 'destination.ip'
>type: VARCHAR
>  - name: 'destination.port'
>type: VARCHAR
> 
>  - name: badips
>type: source-table
>#update-mode: append
>connector:
>  type: filesystem
>  path: "/home/ipsum/levels/badips.csv"
>format:
>  type: csv
>  fields:
>- name: ip
>  type: VARCHAR
>  comment-prefix: "#"
>schema:
>  - name: ip
>type: VARCHAR
> 
>

Re: [PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-18 Thread Zili Chen

Hi Xiaogang,

Thanks for your reply.

According to the feature discussion thread[1] client API enhancement is a
planned
feature towards 1.10 and thus I think this thread is valid if we can reach
a consensus
and introduce new client API in this development cycle.

Best,
tison.

[1]
https://lists.apache.org/thread.html/22639ca7de62a18f50e90db53e73910bd99b7f00c82f7494f4cb035f@%3Cdev.flink.apache.org%3E


SHI Xiaogang  于2019年9月18日周三 下午3:03写道：

> Hi Tison,
>
> Thanks for bringing this.
>
> I think it's fine to break the back compatibility of client API now that
> ClusterClient is not well designed for public usage.
> But from my perspective, we should postpone any modification to existing
> interfaces until we come to an agreement on new client API. Otherwise, our
> users may adapt their implementation more than once.
>
> Regards,
> Xiaogang
>
> Jeff Zhang  于2019年9月18日周三 上午10:49写道：
>
> > Thanks for raising this discussion. Overall +1 to merge NewClusterClient
> > into ClusterClient.
> >
> > 1. I think it is OK to break the backward compatibility. This current
> > client api is no so clean which already cause issue for downstream
> project
> > and flink itself.
> > In flink scala shell, I notice this kind of non-readable code
> > Option[Either
> > [MiniCluster , ClusterClient[_]]])
> >
> >
> https://github.com/apache/flink/blob/master/flink-scala-shell/src/main/scala/org/apache/flink/api/scala/FlinkShell.scala#L138
> > I also created tickets and PR to try to simply it.
> > https://github.com/apache/flink/pull/8546
> > https://github.com/apache/flink/pull/8533
> >Personally I don't think we need to keep backward compatibility for
> > non-well-designed api, otherwise it will bring lots of unnecessary
> > overhead.
> >
> > 2. Another concern is that I notice there're many implementation details
> in
> > ClusterClient. I think we should just expose a thin interface, so maybe
> we
> > can create interface ClusterClient which includes as less methods as
> > possible, and move all the implementation to AbstractClusterClient.
> >
> >
> >
> >
> >
> >
> >
> > Zili Chen  于2019年9月18日周三 上午9:46写道：
> >
> > > Hi devs,
> > >
> > > FLINK-14096[1] was created yesterday. It is aimed at merge the bridge
> > > class NewClusterClient into ClusterClient because with the effort
> > > under FLINK-10392 this bridge class is no longer necessary.
> > >
> > > Technically in current codebase all implementation of interface
> > > NewClusterClient is subclass of ClusterClient so that the work
> > > required is no more than move method declaration. It helps we use
> > > type signature ClusterClient instead of
> > >  > > latter if we aren't in a type variable context. This should not affect
> > > anything internal in Flink scope.
> > >
> > > However, as mentioned by Kostas in the JIRA and a previous discussion
> > > under a commit[2], it seems that we provide some levels of backward
> > > compatibility for ClusterClient and thus it's better to start a public
> > > discussion here.
> > >
> > > There are two concerns from my side.
> > >
> > > 1. How much impact this proposal brings to users programming directly
> > > to ClusterClient?
> > >
> > > The specific changes here are add two methods `submitJob` and
> > > `requestJobResult` which are already implemented by RestClusterClient
> > > and MiniClusterClient. Users would only be affected if they create
> > > a class that inherits ClusterClient and doesn't implement these
> > > methods. Besides, users who create a class that implements
> > > NewClusterClient would be affected by the removal of NewClusterClient.
> > >
> > > If we have to provide backward compatibility and the impact is no
> > > further than those above, we can deprecate NewClusterClient, merge
> > > the methods into ClusterClient with a dummy default like throw
> > > Exception.
> > >
> > > 2. Why do we provide backward compatibility for ClusterClient?
> > >
> > > It already surprises Kostas and me while we think ClusterClient is a
> > > totally internal class which we can evolve regardless of api
> > > stability. Our community promises api stability by marking class
> > > and/or method as @Public/@PublicEvolving. It is wried and even
> > > dangerous we are somehow enforced to provide backward compatibility
> > > for classes without any annotation.
> > >
> > > Besides, as I mention in [2], users who anyway want to program
> > > directly to internal classes/interfaces are considered to prepare to
> > > make adaptations when bump version of Flink.
> > >
> > > Best,
> > > tison.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-14096
> > > [2]
> > >
> > >
> >
> https://github.com/apache/flink/commit/dc9e4494dddfed36432e6bbf6cd3231530bc2e01
> > >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-09-18 Thread Kurt Young

Hi all,

Sorry to join this party late. Big +1 to this flip, especially for the
dropping
"registerTableSink & registerTableSource" part. These are indeed legacy
and we should try to unify them through CatalogTable after we introduce
the concept of Catalog.

>From my understanding, what we can registered should all be metadata,
TableSource/TableSink should only be the one who is responsible to do
the real work, i.e. reading and writing data according to the schema and
other information like computed column, partition, .e.g.

Best,
Kurt


On Wed, Sep 18, 2019 at 5:14 PM JingsongLee 
wrote:

> After some development and thinking, I have a general understanding.
> +1 to registering a source/sink does not fit into the SQL world.
> I am OK to have a deprecated registerTemporarySource/Sink to compatible
> with old ways.
>
> Best,
> Jingsong Lee
>
>
> --
> From:Timo Walther 
> Send Time:2019年9月17日(星期二) 08:00
> To:dev 
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
>
> Hi Dawid,
>
> thanks for the design document. It fixes big concept gaps due to
> historical reasons with proper support for serializability and catalog
> support in mind.
>
> I would not mind a registerTemporarySource/Sink, but the problem that I
> see is that many people think that this is the recommended way of
> registering a table source/sink which is not true. We should guide users
> to either use connect() or DDL API which can be validated and stored in
> catalog.
>
> Also from a concept perspective, registering a source/sink does not fit
> into the SQL world. SQL does not know about source/sinks but only about
> tables. If the responsibility of a TableSource/TableSink is just a pure
> physical data consumer/producer that is not connected to the actual
> logical table schema, we would need a possibility of defining time
> attributes and interpreting/converting a changelog. This should be done
> by the framework with information from the DDL/connect() and not be
> defined in every table source.
>
> Regards,
> Timo
>
>
> On 09.09.19 14:16, JingsongLee wrote:
> > Hi dawid:
> >
> > It is difficult to describe specific examples.
> > Sometimes users will generate some java converters through some
> >   Java code, or generate some Java classes through third-party
> >   libraries. Of course, these can be best done through properties.
> > But this requires additional work from users.My suggestion is to
> >   keep this Java instance class way that is user-friendly.
> >
> > Best,
> > Jingsong Lee
> >
> >
> > --
> > From:Dawid Wysakowicz 
> > Send Time:2019年9月6日(星期五) 16:21
> > To:dev 
> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
> >
> > Hi all,
> > @Jingsong Could you elaborate a bit more what do you mean by
> > "some Connectors are difficult to convert all states to properties"
> > All the Flink provided connectors will definitely be expressible with
> properties (In the end you should be able to use them from DDL). I think if
> a TableSource is complex enough that it handles filter push down, partition
> support etc. should rather be made available both from DDL & java/scala
> code. I'm happy to reconsider adding registerTemporaryTable(String path,
> TableSource source) if you have some concrete examples in mind.
> >
> >
> > @Xuefu: We also considered the ObjectIdentifier (or actually introducing
> a new identifier representation to differentiate between resolved and
> unresolved identifiers) with the same concerns. We decided to suggest the
> string & parsing logic because of usability.
> >  tEnv.from("cat.db.table")
> > is shorter and easier to write than
> >  tEnv.from(Identifier.for("cat", "db", "name")
> > And also implicitly solves the problem what happens if a user (e.g. used
> to other systems) uses that API in a following manner:
> >  tEnv.from(Identifier.for("db.name")
> > I'm happy to revisit it if the general consensus is that it's better to
> use the OO aproach.
> > Best,
> > Dawid
> >
> > On 06/09/2019 10:00, Xuefu Z wrote:
> >
> > Thanks to Dawid for starting the discussion and writeup. It looks pretty
> > good to me except that I'm a little concerned about the object reference
> > and string parsing in the code, which seems to an anti-pattern to OOP.
> Have
> > we considered using ObjectIdenitifier with optional catalog and db parts,
> > esp. if we are worried about arguments of variable length or method
> > overloading? It's quite likely that the result of string parsing is an
> > ObjectIdentifier instance any way.
> >
> > Having string parsing logic in the code is a little dangerous as it
> > duplicates part of the DDL/DML parsing, and they can easily get out of
> sync.
> >
> > Thanks,
> > Xuefu
> >
> > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee  .invalid>
> > wrote:
> >
> >
> > Thanks dawid, +1 for this approach.
> >
> > One

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Jark Wu

Hi,

+1 to strive for reaching consensus on the remaining topics. We are close to 
the truth. It will waste a lot of time if we resume the topic some time later. 

+1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way to 
override a catalog function. 

I’m not sure about “system.system.fun”, it introduces a nonexistent cat & db? 
And we still need to do special treatment for the dedicated system.system cat & 
db? 

Best,
Jark


> 在 2019年9月18日，06:54，Timo Walther  写道：
> 
> Hi everyone,
> 
> @Xuefu: I would like to avoid adding too many things incrementally. Users 
> should be able to override all catalog objects consistently according to 
> FLIP-64 (Support for Temporary Objects in Table module). If functions are 
> treated completely different, we need more code and special cases. From an 
> implementation perspective, this topic only affects the lookup logic which is 
> rather low implementation effort which is why I would like to clarify the 
> remaining items. As you said, we have a slight consenus on overriding 
> built-in functions; we should also strive for reaching consensus on the 
> remaining topics.
> 
> @Dawid: I like your idea as it ensures registering catalog objects consistent 
> and the overriding of built-in functions more explicit.
> 
> Thanks,
> Timo
> 
> 
> On 17.09.19 11:59, kai wang wrote:
>> hi, everyone
>> I think this flip is very meaningful. it supports functions that can be
>> shared by different catalogs and dbs, reducing the duplication of functions.
>> 
>> Our group based on flink's sql parser module implements create function
>> feature, stores the parsed function metadata and schema into mysql, and
>> also customizes the catalog, customizes sql-client to support custom
>> schemas and functions. Loaded, but the function is currently global, and is
>> not subdivided according to catalog and db.
>> 
>> In addition, I very much hope to participate in the development of this
>> flip, I have been paying attention to the community, but found it is more
>> difficult to join.
>>  thank you.
>> 
>> Xuefu Z  于2019年9月17日周二 上午11:19写道：
>> 
>>> Thanks to Tmo and Dawid for sharing thoughts.
>>> 
>>> It seems to me that there is a general consensus on having temp functions
>>> that have no namespaces and overwrite built-in functions. (As a side note
>>> for comparability, the current user defined functions are all temporary and
>>> having no namespaces.)
>>> 
>>> Nevertheless, I can also see the merit of having namespaced temp functions
>>> that can overwrite functions defined in a specific cat/db. However,  this
>>> idea appears orthogonal to the former and can be added incrementally.
>>> 
>>> How about we first implement non-namespaced temp functions now and leave
>>> the door open for namespaced ones for later releases as the requirement
>>> might become more crystal? This also helps shorten the debate and allow us
>>> to make some progress along this direction.
>>> 
>>> As to Dawid's idea of having a dedicated cat/db to host the temporary temp
>>> functions that don't have namespaces, my only concern is the special
>>> treatment for a cat/db, which makes code less clean, as evident in treating
>>> the built-in catalog currently.
>>> 
>>> Thanks,
>>> Xuefiu
>>> 
>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>>> wysakowicz.da...@gmail.com>
>>> wrote:
>>> 
 Hi,
 Another idea to consider on top of Timo's suggestion. How about we have a
 special namespace (catalog + database) for built-in objects? This catalog
 would be invisible for users as Xuefu was suggesting.
 
 Then users could still override built-in functions, if they fully qualify
 object with the built-in namespace, but by default the common logic of
 current dB & cat would be used.
 
 CREATE TEMPORARY FUNCTION func ...
 registers temporary function in current cat & dB
 
 CREATE TEMPORARY FUNCTION cat.db.func ...
 registers temporary function in cat db
 
 CREATE TEMPORARY FUNCTION system.system.func ...
 Overrides built-in function with temporary function
 
 The built-in/system namespace would not be writable for permanent
>>> objects.
 WDYT?
 
 This way I think we can have benefits of both solutions.
 
 Best,
 Dawid
 
 
 On Tue, 17 Sep 2019, 07:24 Timo Walther,  wrote:
 
> Hi Bowen,
> 
> I understand the potential benefit of overriding certain built-in
> functions. I'm open to such a feature if many people agree. However, it
> would be great to still support overriding catalog functions with
> temporary functions in order to prototype a query even though a
> catalog/database might not be available currently or should not be
> modified yet. How about we support both cases?
> 
> CREATE TEMPORARY FUNCTION abs
> -> creates/overrides a built-in function and never consideres current
> catalog and database; inconsistent with other DDL but acceptable for

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-09-18 Thread JingsongLee

After some development and thinking, I have a general understanding.
+1 to registering a source/sink does not fit into the SQL world.
I am OK to have a deprecated registerTemporarySource/Sink to compatible with 
old ways.

Best,
Jingsong Lee


--
From:Timo Walther 
Send Time:2019年9月17日(星期二) 08:00
To:dev 
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi Dawid,

thanks for the design document. It fixes big concept gaps due to 
historical reasons with proper support for serializability and catalog 
support in mind.

I would not mind a registerTemporarySource/Sink, but the problem that I 
see is that many people think that this is the recommended way of 
registering a table source/sink which is not true. We should guide users 
to either use connect() or DDL API which can be validated and stored in 
catalog.

Also from a concept perspective, registering a source/sink does not fit 
into the SQL world. SQL does not know about source/sinks but only about 
tables. If the responsibility of a TableSource/TableSink is just a pure 
physical data consumer/producer that is not connected to the actual 
logical table schema, we would need a possibility of defining time 
attributes and interpreting/converting a changelog. This should be done 
by the framework with information from the DDL/connect() and not be 
defined in every table source.

Regards,
Timo


On 09.09.19 14:16, JingsongLee wrote:
> Hi dawid:
>
> It is difficult to describe specific examples.
> Sometimes users will generate some java converters through some
>   Java code, or generate some Java classes through third-party
>   libraries. Of course, these can be best done through properties.
> But this requires additional work from users.My suggestion is to
>   keep this Java instance class way that is user-friendly.
>
> Best,
> Jingsong Lee
>
>
> --
> From:Dawid Wysakowicz 
> Send Time:2019年9月6日(星期五) 16:21
> To:dev 
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module
>
> Hi all,
> @Jingsong Could you elaborate a bit more what do you mean by
> "some Connectors are difficult to convert all states to properties"
> All the Flink provided connectors will definitely be expressible with 
> properties (In the end you should be able to use them from DDL). I think if a 
> TableSource is complex enough that it handles filter push down, partition 
> support etc. should rather be made available both from DDL & java/scala code. 
> I'm happy to reconsider adding registerTemporaryTable(String path, 
> TableSource source) if you have some concrete examples in mind.
>
>
> @Xuefu: We also considered the ObjectIdentifier (or actually introducing a 
> new identifier representation to differentiate between resolved and 
> unresolved identifiers) with the same concerns. We decided to suggest the 
> string & parsing logic because of usability.
>  tEnv.from("cat.db.table")
> is shorter and easier to write than
>  tEnv.from(Identifier.for("cat", "db", "name")
> And also implicitly solves the problem what happens if a user (e.g. used to 
> other systems) uses that API in a following manner:
>  tEnv.from(Identifier.for("db.name")
> I'm happy to revisit it if the general consensus is that it's better to use 
> the OO aproach.
> Best,
> Dawid
>
> On 06/09/2019 10:00, Xuefu Z wrote:
>
> Thanks to Dawid for starting the discussion and writeup. It looks pretty
> good to me except that I'm a little concerned about the object reference
> and string parsing in the code, which seems to an anti-pattern to OOP. Have
> we considered using ObjectIdenitifier with optional catalog and db parts,
> esp. if we are worried about arguments of variable length or method
> overloading? It's quite likely that the result of string parsing is an
> ObjectIdentifier instance any way.
>
> Having string parsing logic in the code is a little dangerous as it
> duplicates part of the DDL/DML parsing, and they can easily get out of sync.
>
> Thanks,
> Xuefu
>
> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee 
> wrote:
>
>
> Thanks dawid, +1 for this approach.
>
> One concern is the removal of registerTableSink & registerTableSource
>   in TableEnvironment. It has two alternatives:
> 1.the properties approach (DDL, descriptor).
> 2.from/toDataStream.
>
> #1 can only be properties, not java states, and some Connectors
>   are difficult to convert all states to properties.
> #2 can contain java state. But can't use TableSource-related features,
> like project & filter push down, partition support, etc..
>
> Any idea about this?
>
> Best,
> Jingsong Lee
>
>
> --
> From:Dawid Wysakowicz 
> Send Time:2019年9月4日(星期三) 22:20
> To:dev 
> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module
>
> Hi all,
> As part of FLIP-30 a Catalog API was introduced that

[jira] [Created] (FLINK-14118) Reduce the unnecessary flushing when there is no data available for flush

2019-09-18 Thread Yingjie Cao (Jira)

Yingjie Cao created FLINK-14118:
---

 Summary: Reduce the unnecessary flushing when there is no data 
available for flush
 Key: FLINK-14118
 URL: https://issues.apache.org/jira/browse/FLINK-14118
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Reporter: Yingjie Cao
 Fix For: 1.10.0


The new flush implementation which works by triggering a netty user event may 
cause performance regression compared to the old synchronization-based one. 
More specifically, when there is exactly one BufferConsumer in the buffer queue 
of subpartition and no new data will be added for a while in the future (may 
because of just no input or the logic of the operator is to collect some data 
for processing and will not emit records immediately), that is, there is no 
data to send, the OutputFlusher will continuously notify data available and 
wake up the netty thread, though no data will be returned by the pollBuffer 
method.

For some of our production jobs, this will incur 20% to 40% CPU overhead 
compared to the old implementation. We tried to fix the problem by checking if 
there is new data available when flushing, if there is no new data, the netty 
thread will not be notified. It works for our jobs and the cpu usage falls to 
previous level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14117) Translate changes on documentation index page to Chinese

2019-09-18 Thread Fabian Hueske (Jira)

Fabian Hueske created FLINK-14117:
-

 Summary: Translate changes on documentation index page to Chinese
 Key: FLINK-14117
 URL: https://issues.apache.org/jira/browse/FLINK-14117
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Affects Versions: 1.10.0
Reporter: Fabian Hueske


The changes of commit 
[ee0d6fdf0604d74bd1cf9a6eb9cf5338ac1aa4f9|https://github.com/apache/flink/commit/ee0d6fdf0604d74bd1cf9a6eb9cf5338ac1aa4f9#diff-1a523bd9fa0dbf998008b37579210e12]
 on the documentation index page should be translated to Chinese.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14116) Translate changes on Getting Started Overview to Chinese

2019-09-18 Thread Fabian Hueske (Jira)

Fabian Hueske created FLINK-14116:
-

 Summary: Translate changes on Getting Started Overview to Chinese
 Key: FLINK-14116
 URL: https://issues.apache.org/jira/browse/FLINK-14116
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Affects Versions: 1.10.0
Reporter: Fabian Hueske


The changes of commit 
[https://github.com/apache/flink/blob/master/docs/getting-started/walkthroughs/datastream_api.zh.md|https://github.com/apache/flink/commit/ee0d6fdf0604d74bd1cf9a6eb9cf5338ac1aa4f9#diff-759f29741e3adc9d9cdc95c996f25869]
 on the Getting Started Overview should be translated to Chinese



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-14115) Translate DataStream Code Walkthrough to Chinese

2019-09-18 Thread Fabian Hueske (Jira)

Fabian Hueske created FLINK-14115:
-

 Summary: Translate DataStream Code Walkthrough to Chinese
 Key: FLINK-14115
 URL: https://issues.apache.org/jira/browse/FLINK-14115
 Project: Flink
  Issue Type: Task
  Components: chinese-translation, Documentation
Affects Versions: 1.10.0
Reporter: Fabian Hueske


The new DataStream Code Walkthrough should be translated to Chinese:

https://github.com/apache/flink/blob/master/docs/getting-started/walkthroughs/datastream_api.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-18 Thread SHI Xiaogang

Hi Tison,

Thanks for bringing this.

I think it's fine to break the back compatibility of client API now that
ClusterClient is not well designed for public usage.
But from my perspective, we should postpone any modification to existing
interfaces until we come to an agreement on new client API. Otherwise, our
users may adapt their implementation more than once.

Regards,
Xiaogang

Jeff Zhang  于2019年9月18日周三 上午10:49写道：

> Thanks for raising this discussion. Overall +1 to merge NewClusterClient
> into ClusterClient.
>
> 1. I think it is OK to break the backward compatibility. This current
> client api is no so clean which already cause issue for downstream project
> and flink itself.
> In flink scala shell, I notice this kind of non-readable code
> Option[Either
> [MiniCluster , ClusterClient[_]]])
>
> https://github.com/apache/flink/blob/master/flink-scala-shell/src/main/scala/org/apache/flink/api/scala/FlinkShell.scala#L138
> I also created tickets and PR to try to simply it.
> https://github.com/apache/flink/pull/8546
> https://github.com/apache/flink/pull/8533
>Personally I don't think we need to keep backward compatibility for
> non-well-designed api, otherwise it will bring lots of unnecessary
> overhead.
>
> 2. Another concern is that I notice there're many implementation details in
> ClusterClient. I think we should just expose a thin interface, so maybe we
> can create interface ClusterClient which includes as less methods as
> possible, and move all the implementation to AbstractClusterClient.
>
>
>
>
>
>
>
> Zili Chen  于2019年9月18日周三 上午9:46写道：
>
> > Hi devs,
> >
> > FLINK-14096[1] was created yesterday. It is aimed at merge the bridge
> > class NewClusterClient into ClusterClient because with the effort
> > under FLINK-10392 this bridge class is no longer necessary.
> >
> > Technically in current codebase all implementation of interface
> > NewClusterClient is subclass of ClusterClient so that the work
> > required is no more than move method declaration. It helps we use
> > type signature ClusterClient instead of
> >  > latter if we aren't in a type variable context. This should not affect
> > anything internal in Flink scope.
> >
> > However, as mentioned by Kostas in the JIRA and a previous discussion
> > under a commit[2], it seems that we provide some levels of backward
> > compatibility for ClusterClient and thus it's better to start a public
> > discussion here.
> >
> > There are two concerns from my side.
> >
> > 1. How much impact this proposal brings to users programming directly
> > to ClusterClient?
> >
> > The specific changes here are add two methods `submitJob` and
> > `requestJobResult` which are already implemented by RestClusterClient
> > and MiniClusterClient. Users would only be affected if they create
> > a class that inherits ClusterClient and doesn't implement these
> > methods. Besides, users who create a class that implements
> > NewClusterClient would be affected by the removal of NewClusterClient.
> >
> > If we have to provide backward compatibility and the impact is no
> > further than those above, we can deprecate NewClusterClient, merge
> > the methods into ClusterClient with a dummy default like throw
> > Exception.
> >
> > 2. Why do we provide backward compatibility for ClusterClient?
> >
> > It already surprises Kostas and me while we think ClusterClient is a
> > totally internal class which we can evolve regardless of api
> > stability. Our community promises api stability by marking class
> > and/or method as @Public/@PublicEvolving. It is wried and even
> > dangerous we are somehow enforced to provide backward compatibility
> > for classes without any annotation.
> >
> > Besides, as I mention in [2], users who anyway want to program
> > directly to internal classes/interfaces are considered to prepare to
> > make adaptations when bump version of Flink.
> >
> > Best,
> > tison.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-14096
> > [2]
> >
> >
> https://github.com/apache/flink/commit/dc9e4494dddfed36432e6bbf6cd3231530bc2e01
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>

[jira] [Created] (FLINK-14114) Shift down ClusterClient#timeout to RestClusterClient

2019-09-18 Thread TisonKun (Jira)

TisonKun created FLINK-14114:


 Summary: Shift down ClusterClient#timeout to RestClusterClient
 Key: FLINK-14114
 URL: https://issues.apache.org/jira/browse/FLINK-14114
 Project: Flink
  Issue Type: Sub-task
  Components: Client / Job Submission
Reporter: TisonKun
 Fix For: 1.10.0


{{ClusterClient#timeout}} is only used in {{RestClusterClient}}, even without 
this prerequisite we can always shift down {{timeout}} field to subclasses of 
{{ClusterClient}}. It is towards an interface-ized {{ClusterClient}}. By side 
effect, we could reduce the dependency to parsing duration with Scala Duration 
on the fly.

CC [~till.rohrmann] [~zhuzh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-18 Thread Zili Chen

Hi Jeff,

Thanks for your reply.

The ongoing client API enhancement thread[1] is mainly aimed at dealing with
issues of our client API, as you mentioned, current client API is no so
clean.

Because client API naturally becomes public & user-facing inteface it is
expected that we start a series of discussions for how the inteface should
look like. However, it isn't expected that we have to talk about backward
compatibility too much in this scope.

I agree that it is painful if we always keep compatibility for
non-well-designed API. Even in this specific scenario we bring such to
Public.
It is mentioned in the discussion under [2] that I think it could be the
time
or so to discuss our InterfaceAudience policy. At least it would be a pity
if
we don't address this InterfaceAudience issue towards 2.0. But let's say it
could be a separated thread.

For expose a thin interface and move all the implementation to
AbstractClusterClient, I think the community consensus is towards an
ClusterClient interface and thus there is no need for an
AbstractClusterClient.
For implementation details, it is mainly about a series of #run methods. We
will gradually exclude them from ClusterClient and it is the responsibility
of Executor in the design document[3] to take care of compilation and
deployment.

BTW, I take a look at the pull requests you link to. In fact I create a
similar issue[4] and also consider simplify code in flink-scala-shell. Let's
move the detailed discussion to the corresponding issues and pull requests
or start another thread then. I don't intend to cover a lot of concerns
generally under this thread :-)

Best,
tison.

[1]
https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
[2]
https://github.com/apache/flink/commit/dc9e4494dddfed36432e6bbf6cd3231530bc2e01#commitcomment-34980406
[3]
https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#
[4] https://issues.apache.org/jira/browse/FLINK-13961

Jeff Zhang  于2019年9月18日周三 上午10:49写道：

> Thanks for raising this discussion. Overall +1 to merge NewClusterClient
> into ClusterClient.
>
> 1. I think it is OK to break the backward compatibility. This current
> client api is no so clean which already cause issue for downstream project
> and flink itself.
> In flink scala shell, I notice this kind of non-readable code
> Option[Either
> [MiniCluster , ClusterClient[_]]])
>
> https://github.com/apache/flink/blob/master/flink-scala-shell/src/main/scala/org/apache/flink/api/scala/FlinkShell.scala#L138
> I also created tickets and PR to try to simply it.
> https://github.com/apache/flink/pull/8546
> https://github.com/apache/flink/pull/8533
>Personally I don't think we need to keep backward compatibility for
> non-well-designed api, otherwise it will bring lots of unnecessary
> overhead.
>
> 2. Another concern is that I notice there're many implementation details in
> ClusterClient. I think we should just expose a thin interface, so maybe we
> can create interface ClusterClient which includes as less methods as
> possible, and move all the implementation to AbstractClusterClient.
>
>
>
>
>
>
>
> Zili Chen  于2019年9月18日周三 上午9:46写道：
>
> > Hi devs,
> >
> > FLINK-14096[1] was created yesterday. It is aimed at merge the bridge
> > class NewClusterClient into ClusterClient because with the effort
> > under FLINK-10392 this bridge class is no longer necessary.
> >
> > Technically in current codebase all implementation of interface
> > NewClusterClient is subclass of ClusterClient so that the work
> > required is no more than move method declaration. It helps we use
> > type signature ClusterClient instead of
> >  > latter if we aren't in a type variable context. This should not affect
> > anything internal in Flink scope.
> >
> > However, as mentioned by Kostas in the JIRA and a previous discussion
> > under a commit[2], it seems that we provide some levels of backward
> > compatibility for ClusterClient and thus it's better to start a public
> > discussion here.
> >
> > There are two concerns from my side.
> >
> > 1. How much impact this proposal brings to users programming directly
> > to ClusterClient?
> >
> > The specific changes here are add two methods `submitJob` and
> > `requestJobResult` which are already implemented by RestClusterClient
> > and MiniClusterClient. Users would only be affected if they create
> > a class that inherits ClusterClient and doesn't implement these
> > methods. Besides, users who create a class that implements
> > NewClusterClient would be affected by the removal of NewClusterClient.
> >
> > If we have to provide backward compatibility and the impact is no
> > further than those above, we can deprecate NewClusterClient, merge
> > the methods into ClusterClient with a dummy default like throw
> > Exception.
> >
> > 2. Why do we provide backward compatibility for ClusterClient?
> >
> > It already surprises Kostas and me while we think ClusterClient

38 matches

Mail list logo