Re: [DISCUSS] Flink SQL DDL Design

2018-12-17 Thread Jark Wu
Hi Timo,

I think I get your point why it would be better to put Table Update Mode in
MVP. But because this is a sophisticated problem, we need to think about it
carefully and need some discussions offline. We will reach out to here when
we have a clear design.


8). Support row/map/array data type
Do you mean how to distinguish int[] and Integer[]?  Yes, maybe we need to
support NULL/NOT NULL just for array elements, such as: ARRAY
is int[], ARRAY is Integer[].


Cheers,
Jark

On Fri, 14 Dec 2018 at 19:46, Timo Walther  wrote:

> Hi all,
>
> I think we should discuss what we consider an MVP DDL. For me, an MVP
> DDL was to just focus on a CREATE TABLE statement. It would be great to
> come up with a solution that finally solves the issue of connecting
> different kind of systems. One reason why we postponed DDL statements
> for quite some time is that we cannot change it easily once released.
>
> However, the current state of the discussion can be summarized by the
> following functionality:
>
> 1. Only support append source tables (because the distinction of
> update/retract table is not clear).
> 2. Only support append and update sink tables (because a changeflag is
> missing).
> 3. Don't support outputting to Kafka with time attributes (because we
> cannot set a timestamp).
>
> Personally, I would like to have more use cases enabled by solving the
> header timestamps and change flag discussion. And I don't see a reason
> why we have to rush here.
>
> 8). Support row/map/array data type
> How do we want to support object arrays vs. primitive arrays? Currently,
> we need to make this clear distinction for between external system and
> Java [1] (E.g. byte[] arrays vs. object arrays) and users can choose
> between Types.PRIMITIVE_ARRAY and Types.OBJECT_ARRAY. Otherwise we need
> to support NULL/NOT NULL for array elements.
>
> 4) Event-Time Attributes and Watermarks
> I completely agree with Rong here. `ts AS SYSTEMROWTIME()` indicates
> that the system takes care of this column and for unification this would
> mean both for sources and sinks. It is still a computed column but gives
> hints to connectors. Implementing connectors can choose if they want to
> use this hint or not. The Flink Kafka connector would make use of it.
> @Jark: I think a PERSISTED keyword would confuse users (as shown by your
> Stackoverflow question) and would only make sense for SYSTEMROWTIME and
> no other computed column.
>
> 3) SOURCE / SINK / BOTH
> @Jark: My initial suggestion was to make the SOURCE/SINK optional such
> that users can only use CREATE TABLE depending on the use case. But as I
> said before, since I cannot find support here, we can drop the keywords.
>
> 7) Table Update Mode
> @Jark: The questions that you posted are exactly the ones that we should
> find an answer for. Because a DDL should just be the front end to the
> characteristics of an engine. After thinking about it again a change
> flag is actually more similar to a PARTITION BY clause because it
> defines a field that is not in the table's schema but in the schema of
> the physical format. However, the columns defined by a PARTITION BY are
> shown when describing/projecting a table whereas a change flag column
> must not be shown.
>
> If a table source supports append, upserts, and retractions, we need a
> way to express how we want to connect to the system.
>
> hasPrimaryKey() && !hasChangeFlag() -> append mode
> hasPrimaryKey() && hasChangeFlag() -> upsert mode
> !hasPrimaryKey() && hasChangeFlag() -> retract mode
>
> Are we fine with this?
>
> Regarding reading `topic`, `partition`, `offset` or custom properties
> from message headers. I already discussed this in my unified connector
> document. We don't need built-in functions for all these properties.
> Those things depend on the connector and format, it is their
> responsibility to extend the table schema in order to expose those
> properties (e.g. by providing a Map for all these kind
> of properties).
>
> Example:
>
> CREATE TABLE myTopic (
>  col1 INT,
>  col2 VARCHAR,
>  col3 MAP,
>  col4 AS SYSTEMROWTIME()
> )
> PARTITION BY (col0 LONG)
> WITH (
>connector.type = kafka
>format.type = key-value-metadata
>format.key-format.type = avro
>format.value-format.type = json
> )
>
> The format defines to use a KeyedDeserializationSchema that extends the
> schema by a metadata column. The PARTITION BY declares the columns for
> Kafka's key in Avro format. col1 till col2 are Kafka's JSON columns.
>
> Thanks for your feedback,
> Timo
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#type-strings
>
>
> Am 13.12.18 um 09:50 schrieb Jark Wu:
> > Hi all,
> >
> > Here are a bunch of my thoughts:
> >
> > 8). support row/map/array data type
> > That's fine with me if we want to support them in the MVP. In my mind, we
> > can have the field type syntax like this:
> >
> > ```
> > filedType ::=
> >  {
> >  sim

Re: [DISCUSS] Flink SQL DDL Design

2018-12-14 Thread Timo Walther

Hi all,

I think we should discuss what we consider an MVP DDL. For me, an MVP 
DDL was to just focus on a CREATE TABLE statement. It would be great to 
come up with a solution that finally solves the issue of connecting 
different kind of systems. One reason why we postponed DDL statements 
for quite some time is that we cannot change it easily once released.


However, the current state of the discussion can be summarized by the 
following functionality:


1. Only support append source tables (because the distinction of 
update/retract table is not clear).
2. Only support append and update sink tables (because a changeflag is 
missing).
3. Don't support outputting to Kafka with time attributes (because we 
cannot set a timestamp).


Personally, I would like to have more use cases enabled by solving the 
header timestamps and change flag discussion. And I don't see a reason 
why we have to rush here.


8). Support row/map/array data type
How do we want to support object arrays vs. primitive arrays? Currently, 
we need to make this clear distinction for between external system and 
Java [1] (E.g. byte[] arrays vs. object arrays) and users can choose 
between Types.PRIMITIVE_ARRAY and Types.OBJECT_ARRAY. Otherwise we need 
to support NULL/NOT NULL for array elements.


4) Event-Time Attributes and Watermarks
I completely agree with Rong here. `ts AS SYSTEMROWTIME()` indicates 
that the system takes care of this column and for unification this would 
mean both for sources and sinks. It is still a computed column but gives 
hints to connectors. Implementing connectors can choose if they want to 
use this hint or not. The Flink Kafka connector would make use of it.
@Jark: I think a PERSISTED keyword would confuse users (as shown by your 
Stackoverflow question) and would only make sense for SYSTEMROWTIME and 
no other computed column.


3) SOURCE / SINK / BOTH
@Jark: My initial suggestion was to make the SOURCE/SINK optional such 
that users can only use CREATE TABLE depending on the use case. But as I 
said before, since I cannot find support here, we can drop the keywords.


7) Table Update Mode
@Jark: The questions that you posted are exactly the ones that we should 
find an answer for. Because a DDL should just be the front end to the 
characteristics of an engine. After thinking about it again a change 
flag is actually more similar to a PARTITION BY clause because it 
defines a field that is not in the table's schema but in the schema of 
the physical format. However, the columns defined by a PARTITION BY are 
shown when describing/projecting a table whereas a change flag column 
must not be shown.


If a table source supports append, upserts, and retractions, we need a 
way to express how we want to connect to the system.


hasPrimaryKey() && !hasChangeFlag() -> append mode
hasPrimaryKey() && hasChangeFlag() -> upsert mode
!hasPrimaryKey() && hasChangeFlag() -> retract mode

Are we fine with this?

Regarding reading `topic`, `partition`, `offset` or custom properties 
from message headers. I already discussed this in my unified connector 
document. We don't need built-in functions for all these properties. 
Those things depend on the connector and format, it is their 
responsibility to extend the table schema in order to expose those 
properties (e.g. by providing a Map for all these kind 
of properties).


Example:

CREATE TABLE myTopic (
    col1 INT,
    col2 VARCHAR,
    col3 MAP,
    col4 AS SYSTEMROWTIME()
)
PARTITION BY (col0 LONG)
WITH (
  connector.type = kafka
  format.type = key-value-metadata
  format.key-format.type = avro
  format.value-format.type = json
)

The format defines to use a KeyedDeserializationSchema that extends the 
schema by a metadata column. The PARTITION BY declares the columns for 
Kafka's key in Avro format. col1 till col2 are Kafka's JSON columns.


Thanks for your feedback,
Timo

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#type-strings



Am 13.12.18 um 09:50 schrieb Jark Wu:

Hi all,

Here are a bunch of my thoughts:

8). support row/map/array data type
That's fine with me if we want to support them in the MVP. In my mind, we
can have the field type syntax like this:

```
filedType ::=
 {
 simpleType
  | MAP
  | ARRAY
  | ROW
 }
```

I have included this in @Shuyi's summary doc [1] . Please leave feedbacks
there!

[1]
https://docs.google.com/document/d/1ug1-aVBSCxZQk58kR-yaK2ETCgL3zg0eDUVGCnW2V9E/edit

3) SOURCE / SINK / BOTH
@Timo, CREATE TABLE statement is registering a virtual table in the session
or catalog. I don't think it is immutable, as we might also want to support
CREATE INDEX statements in the future. On the other hand, ACL is not a part
of the table definition, it should belong to the permission system which is
usually stored in somewhere else. So GRANT/INVOKE sounds like a more
standard option.

7) Table Update Mode
I agree with @Shuy

Re: [DISCUSS] Flink SQL DDL Design

2018-12-13 Thread Jark Wu
Hi all,

Here are a bunch of my thoughts:

8). support row/map/array data type
That's fine with me if we want to support them in the MVP. In my mind, we
can have the field type syntax like this:

```
filedType ::=
{
simpleType
 | MAP
 | ARRAY
 | ROW
}
```

I have included this in @Shuyi's summary doc [1] . Please leave feedbacks
there!

[1]
https://docs.google.com/document/d/1ug1-aVBSCxZQk58kR-yaK2ETCgL3zg0eDUVGCnW2V9E/edit

3) SOURCE / SINK / BOTH
@Timo, CREATE TABLE statement is registering a virtual table in the session
or catalog. I don't think it is immutable, as we might also want to support
CREATE INDEX statements in the future. On the other hand, ACL is not a part
of the table definition, it should belong to the permission system which is
usually stored in somewhere else. So GRANT/INVOKE sounds like a more
standard option.

7) Table Update Mode
I agree with @Shuyi that table update mode can be left out from the MVP.
Because IMO, the update mode will not break the current MVP design. It
should be something to add, like the CHANGE_FLAG you proposed. We can
continue this discussion when we finalize the MVP.

Meanwhile, the update mode is a big topic which may involve several weeks
to discuss. For example, (a) do we support CHANGE_FLAG when the table
supports upsert (or when the table defined a primary key)?  (b) the
CHANGE_FLAG should support write and read both. (c) currently, we only
support true (add) and false (retract) flag type, are they enough? (d) How
to connect an external storage which also support insert/delete flag like
mysql binlog?

Regarding to the CHANGE_FLAG @Timo proposed, I think this is a good
direction. But should isRetraction be a physical field and make CHANGE_FLAG
like a constraint on that? If yes, then what the type of isRetraction?

4.b) Ingesting and writing timestamps to systems.
@Shuyi, PERSISTED can solve the problem of the field is not physically
stored. However, it doesn't solve the problem that how to write a field
back to the computed column, because "A computed column cannot be the
target of an INSERT or UPDATE statement" even if the computed column is
persisted. If we want to write a rowtime back the the external system, the
DML should look like this: "INSERT INTO sink SELECT a, rowtime FROM
source". The point is that the `rowtime` must be specified in the INSERT
statement, that's why I hope the `rowtime` field in Table is not a computed
column. See more information about PERSISTED [2] [3].

Another point to consider is SYSTEMROWTIME() only solve reading timestamp
from message header in systems. There are many similar requirements here,
such as reading `topic`, `partition`, `offset` or custom properties from
message headers, do we plan to support a bunch of built-in functions like
SYSTEMROWTIME()?  Do we have some clean and easy way for this?

[2]:
https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-computed-column-definition-transact-sql?view=sql-server-2017
[3]:
https://stackoverflow.com/questions/51390531/sql-server-persisted-computed-columns-versus-actual-normal-column

Looking forward to collaborate with you guys!

Best,
Jark


On Thu, 13 Dec 2018 at 01:38, Rong Rong  wrote:

> Thanks for the summary effort @shuyi. Sorry for jumping in the discussion
> so late.
>
> As of the scope of MVP, I think we might want to consider adding "table
> update mode" problem to it. I agree with @timo that might not be easily
> changed in the future if the flags has to be part of the schema/column
> definition.
>
> Regarding the components under discussion.
> 4) Event-Time Attributes and Watermarks
> b, c) I actually like the special indicator way @fabian suggested to hint
> Flink to read time attributes directly from the system not the data `(ts AS
> SYSTEMROWTIME())`. It should also address the "compute field not emitted"
> problem by carrying the "virtual column" concept like @shuyi suggested.
> However if I understand correctly, this also required to be defined as part
> of the schema/column definition.
>
> 3) SOURCE / SINK / BOTH
> +1 on not adding properties to `CREATE TABLE` to manage ACL/permission.
>
> On a higher level, I think one question I have is whether we can
> definitively come to an agreement that the features under discussion (and
> potential solutions) can be cleanly adjusted/added from what we are
> providing on MVP (e.g. the schema/column definition might be hard to
> achieve but if we all agree ACL/permission should not be part of the
> `CREATE TABLE` and a decision can be made later). @shuyi I can also help in
> drafting the FLIP doc by summarizing the features under discussion and the
> concerns to whether included in the MVP, so that we can carry on the
> discussions alongside with the MVP implementation effort. I think each one
> of these features deserves a subsection dedicated for it.
>
> Many thanks,
> Rong
>
>
> On Wed, Dec 12, 2018 at 1:14 AM Shuyi Chen  wrot

Re: [DISCUSS] Flink SQL DDL Design

2018-12-12 Thread Rong Rong
Thanks for the summary effort @shuyi. Sorry for jumping in the discussion
so late.

As of the scope of MVP, I think we might want to consider adding "table
update mode" problem to it. I agree with @timo that might not be easily
changed in the future if the flags has to be part of the schema/column
definition.

Regarding the components under discussion.
4) Event-Time Attributes and Watermarks
b, c) I actually like the special indicator way @fabian suggested to hint
Flink to read time attributes directly from the system not the data `(ts AS
SYSTEMROWTIME())`. It should also address the "compute field not emitted"
problem by carrying the "virtual column" concept like @shuyi suggested.
However if I understand correctly, this also required to be defined as part
of the schema/column definition.

3) SOURCE / SINK / BOTH
+1 on not adding properties to `CREATE TABLE` to manage ACL/permission.

On a higher level, I think one question I have is whether we can
definitively come to an agreement that the features under discussion (and
potential solutions) can be cleanly adjusted/added from what we are
providing on MVP (e.g. the schema/column definition might be hard to
achieve but if we all agree ACL/permission should not be part of the
`CREATE TABLE` and a decision can be made later). @shuyi I can also help in
drafting the FLIP doc by summarizing the features under discussion and the
concerns to whether included in the MVP, so that we can carry on the
discussions alongside with the MVP implementation effort. I think each one
of these features deserves a subsection dedicated for it.

Many thanks,
Rong


On Wed, Dec 12, 2018 at 1:14 AM Shuyi Chen  wrote:

> Hi all,
>
> I summarize the MVP based on the features that we agreed upon. For table
> update mode and custom watermark strategy and ts extractor, I found there
> are some discussions, so I decided to leave them out for the MVP.
> For row/map/array data type, I think we can add it as well if everyone
> agrees.
>
>
> 4) Event-Time Attributes and Watermarks
> Cited from SQL Server 2017 document (
>
> https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2017
> ),
> "A
> computed column is a virtual column that is not physically stored in the
> table, unless the column is marked PERSISTED. A computed column expression
> can use data from other columns to calculate a value for the column to
> which it belongs. " I think we can also use introduce the PERSISTED keyword
> for computed column to indicate that the field can be stored back to the
> table, i.e. ts AS SYSTEMROWTIME() PERSISTED.
>
> 3) SOURCE / SINK / BOTH
> GRANT/INVOKE sounds like a more standard option than adding a property to
> CREATE TABLE to manage the ACL/permission. The ACL can be stored somewhere
> in a database, and allow/disallow access to a dynamic table depending on
> whether it's a "INSERT INTO" or "SELECT".
>
> I can volunteer to put the discussion as a FLIP.  I can try to summarize
> the current discussion, and share edit permission with you to collaborate
> on the documents. After we finalized the doc, we can publish it as a FLIP.
> What do you think?
>
> Shuyi
>
>
>
> On Tue, Dec 11, 2018 at 9:13 AM Timo Walther  wrote:
>
> > Hi all,
> >
> > thanks for summarizing the discussion @Shuyi. I think we need to include
> > the "table update mode" problem as it might not be changed easily in the
> > future. Regarding "support row/map/array data type", I don't see a
> > problem why we should not support them now as the data types are already
> > included in the runtime. The "support custom timestamp extractor" is
> > solved by the computed columns approach. The "custom watermark strategy"
> > can be added by supplying a class name as paramter in my opinion.
> >
> > Regarding the comments of Lin and Jark:
> >
> > @Lin: Instantiating a TableSource/Sink should not cost much, but we
> > should not mix catalog discussion and DDL at this point.
> >
> > 4) Event-Time Attributes and Watermarks
> > 4.b) Regarding `ts AS SYSTEMROWTIME()` and Lin's comment about "will
> > violate the rule": there is no explicit rule of doing so. Computed
> > column are also not standard compliant, if we can use information that
> > is encoded in constraints we should use it. Adding more and more
> > top-level properties makes the interaction with connectors more
> > difficult. An additional HEADER keyword sounds too connector-specific
> > and also not SQL compliant to me.
> >
> > 3) SOURCE / SINK / BOTH
> > GRANT/INVOKE are mutating an existing table, right? In my opinion,
> > independent of SQL databases but focusing on Flink user requirements, a
> > CREATE TABLE statement should be an immutable definition of a connection
> > to an external system.
> >
> > 7) Table Update Mode
> > As far as I can see, the only thing missing for enabling all table modes
> > is the declaration of a change flag. We could introduce a new keyword
> > here similar to WATERMARK:
> >
> > CREA

Re: [DISCUSS] Flink SQL DDL Design

2018-12-12 Thread Teja MVSR
Hi all,

I have been following this thread and it looks interesting. Can I please be
of any help, please let me know.

Thanks,
Teja

On Wed, Dec 12, 2018, 4:31 AM Kurt Young  Sounds great, thanks for the effort, Shuyi.
>
> Best,
> Kurt
>
>
> On Wed, Dec 12, 2018 at 5:14 PM Shuyi Chen  wrote:
>
> > Hi all,
> >
> > I summarize the MVP based on the features that we agreed upon. For table
> > update mode and custom watermark strategy and ts extractor, I found there
> > are some discussions, so I decided to leave them out for the MVP.
> > For row/map/array data type, I think we can add it as well if everyone
> > agrees.
> >
> >
> > 4) Event-Time Attributes and Watermarks
> > Cited from SQL Server 2017 document (
> >
> >
> https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2017
> > ),
> > "A
> > computed column is a virtual column that is not physically stored in the
> > table, unless the column is marked PERSISTED. A computed column
> expression
> > can use data from other columns to calculate a value for the column to
> > which it belongs. " I think we can also use introduce the PERSISTED
> keyword
> > for computed column to indicate that the field can be stored back to the
> > table, i.e. ts AS SYSTEMROWTIME() PERSISTED.
> >
> > 3) SOURCE / SINK / BOTH
> > GRANT/INVOKE sounds like a more standard option than adding a property to
> > CREATE TABLE to manage the ACL/permission. The ACL can be stored
> somewhere
> > in a database, and allow/disallow access to a dynamic table depending on
> > whether it's a "INSERT INTO" or "SELECT".
> >
> > I can volunteer to put the discussion as a FLIP.  I can try to summarize
> > the current discussion, and share edit permission with you to collaborate
> > on the documents. After we finalized the doc, we can publish it as a
> FLIP.
> > What do you think?
> >
> > Shuyi
> >
> >
> >
> > On Tue, Dec 11, 2018 at 9:13 AM Timo Walther  wrote:
> >
> > > Hi all,
> > >
> > > thanks for summarizing the discussion @Shuyi. I think we need to
> include
> > > the "table update mode" problem as it might not be changed easily in
> the
> > > future. Regarding "support row/map/array data type", I don't see a
> > > problem why we should not support them now as the data types are
> already
> > > included in the runtime. The "support custom timestamp extractor" is
> > > solved by the computed columns approach. The "custom watermark
> strategy"
> > > can be added by supplying a class name as paramter in my opinion.
> > >
> > > Regarding the comments of Lin and Jark:
> > >
> > > @Lin: Instantiating a TableSource/Sink should not cost much, but we
> > > should not mix catalog discussion and DDL at this point.
> > >
> > > 4) Event-Time Attributes and Watermarks
> > > 4.b) Regarding `ts AS SYSTEMROWTIME()` and Lin's comment about "will
> > > violate the rule": there is no explicit rule of doing so. Computed
> > > column are also not standard compliant, if we can use information that
> > > is encoded in constraints we should use it. Adding more and more
> > > top-level properties makes the interaction with connectors more
> > > difficult. An additional HEADER keyword sounds too connector-specific
> > > and also not SQL compliant to me.
> > >
> > > 3) SOURCE / SINK / BOTH
> > > GRANT/INVOKE are mutating an existing table, right? In my opinion,
> > > independent of SQL databases but focusing on Flink user requirements, a
> > > CREATE TABLE statement should be an immutable definition of a
> connection
> > > to an external system.
> > >
> > > 7) Table Update Mode
> > > As far as I can see, the only thing missing for enabling all table
> modes
> > > is the declaration of a change flag. We could introduce a new keyword
> > > here similar to WATERMARK:
> > >
> > > CREATE TABLE output_kafka_t1(
> > >id bigint,
> > >msg varchar,
> > >CHANGE_FLAG FOR isRetraction
> > > ) WITH (
> > >type=kafka
> > >,...
> > > );
> > >
> > > CREATE TABLE output_kafka_t1(
> > >CHANGE_FLAG FOR isUpsert
> > >id bigint,
> > >msg varchar,
> > >PRIMARY_KEY(id)
> > > ) WITH (
> > >type=kafka
> > >,...
> > > );
> > >
> > > What do you think?
> > >
> > > @Jark: We should definitely stage the discussions and mention the
> > > opinions and advantages/disadvantages that have been proposed already
> in
> > > the FLIP.
> > >
> > > Regards,
> > > Timo
> > >
> > > Am 10.12.18 um 08:10 schrieb Jark Wu:
> > > > Hi all,
> > > >
> > > > It's great to see we have an agreement on MVP.
> > > >
> > > > 4.b) Ingesting and writing timestamps to systems.
> > > > I would treat the field as a physical column not a virtual column. If
> > we
> > > > treat it as computed column, it will be confused that the behavior is
> > > > different when it is a source or sink.
> > > > When it is a physical column, the behavior could be unified. Then the
> > > > problem is how to mapping from the field to kafka message timestamp?
> > > > One is Lin proposed above and i

Re: [DISCUSS] Flink SQL DDL Design

2018-12-12 Thread Kurt Young
Sounds great, thanks for the effort, Shuyi.

Best,
Kurt


On Wed, Dec 12, 2018 at 5:14 PM Shuyi Chen  wrote:

> Hi all,
>
> I summarize the MVP based on the features that we agreed upon. For table
> update mode and custom watermark strategy and ts extractor, I found there
> are some discussions, so I decided to leave them out for the MVP.
> For row/map/array data type, I think we can add it as well if everyone
> agrees.
>
>
> 4) Event-Time Attributes and Watermarks
> Cited from SQL Server 2017 document (
>
> https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2017
> ),
> "A
> computed column is a virtual column that is not physically stored in the
> table, unless the column is marked PERSISTED. A computed column expression
> can use data from other columns to calculate a value for the column to
> which it belongs. " I think we can also use introduce the PERSISTED keyword
> for computed column to indicate that the field can be stored back to the
> table, i.e. ts AS SYSTEMROWTIME() PERSISTED.
>
> 3) SOURCE / SINK / BOTH
> GRANT/INVOKE sounds like a more standard option than adding a property to
> CREATE TABLE to manage the ACL/permission. The ACL can be stored somewhere
> in a database, and allow/disallow access to a dynamic table depending on
> whether it's a "INSERT INTO" or "SELECT".
>
> I can volunteer to put the discussion as a FLIP.  I can try to summarize
> the current discussion, and share edit permission with you to collaborate
> on the documents. After we finalized the doc, we can publish it as a FLIP.
> What do you think?
>
> Shuyi
>
>
>
> On Tue, Dec 11, 2018 at 9:13 AM Timo Walther  wrote:
>
> > Hi all,
> >
> > thanks for summarizing the discussion @Shuyi. I think we need to include
> > the "table update mode" problem as it might not be changed easily in the
> > future. Regarding "support row/map/array data type", I don't see a
> > problem why we should not support them now as the data types are already
> > included in the runtime. The "support custom timestamp extractor" is
> > solved by the computed columns approach. The "custom watermark strategy"
> > can be added by supplying a class name as paramter in my opinion.
> >
> > Regarding the comments of Lin and Jark:
> >
> > @Lin: Instantiating a TableSource/Sink should not cost much, but we
> > should not mix catalog discussion and DDL at this point.
> >
> > 4) Event-Time Attributes and Watermarks
> > 4.b) Regarding `ts AS SYSTEMROWTIME()` and Lin's comment about "will
> > violate the rule": there is no explicit rule of doing so. Computed
> > column are also not standard compliant, if we can use information that
> > is encoded in constraints we should use it. Adding more and more
> > top-level properties makes the interaction with connectors more
> > difficult. An additional HEADER keyword sounds too connector-specific
> > and also not SQL compliant to me.
> >
> > 3) SOURCE / SINK / BOTH
> > GRANT/INVOKE are mutating an existing table, right? In my opinion,
> > independent of SQL databases but focusing on Flink user requirements, a
> > CREATE TABLE statement should be an immutable definition of a connection
> > to an external system.
> >
> > 7) Table Update Mode
> > As far as I can see, the only thing missing for enabling all table modes
> > is the declaration of a change flag. We could introduce a new keyword
> > here similar to WATERMARK:
> >
> > CREATE TABLE output_kafka_t1(
> >id bigint,
> >msg varchar,
> >CHANGE_FLAG FOR isRetraction
> > ) WITH (
> >type=kafka
> >,...
> > );
> >
> > CREATE TABLE output_kafka_t1(
> >CHANGE_FLAG FOR isUpsert
> >id bigint,
> >msg varchar,
> >PRIMARY_KEY(id)
> > ) WITH (
> >type=kafka
> >,...
> > );
> >
> > What do you think?
> >
> > @Jark: We should definitely stage the discussions and mention the
> > opinions and advantages/disadvantages that have been proposed already in
> > the FLIP.
> >
> > Regards,
> > Timo
> >
> > Am 10.12.18 um 08:10 schrieb Jark Wu:
> > > Hi all,
> > >
> > > It's great to see we have an agreement on MVP.
> > >
> > > 4.b) Ingesting and writing timestamps to systems.
> > > I would treat the field as a physical column not a virtual column. If
> we
> > > treat it as computed column, it will be confused that the behavior is
> > > different when it is a source or sink.
> > > When it is a physical column, the behavior could be unified. Then the
> > > problem is how to mapping from the field to kafka message timestamp?
> > > One is Lin proposed above and is also used in KSQL[1]. Another idea is
> > > introducing a HEADER column which strictly map by name to the fields in
> > > message header.
> > > For example,
> > >
> > > CREATE TABLE output_kafka_t1(
> > >id bigint,
> > >ts timestamp HEADER,
> > >msg varchar
> > > ) WITH (
> > >type=kafka
> > >,...
> > > );
> > >
> > > This is used in Alibaba but not included in the DDL draft. It will
> > further
> > > extend the S

Re: [DISCUSS] Flink SQL DDL Design

2018-12-12 Thread Shuyi Chen
Hi all,

I summarize the MVP based on the features that we agreed upon. For table
update mode and custom watermark strategy and ts extractor, I found there
are some discussions, so I decided to leave them out for the MVP.
For row/map/array data type, I think we can add it as well if everyone
agrees.


4) Event-Time Attributes and Watermarks
Cited from SQL Server 2017 document (
https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-2017),
"A
computed column is a virtual column that is not physically stored in the
table, unless the column is marked PERSISTED. A computed column expression
can use data from other columns to calculate a value for the column to
which it belongs. " I think we can also use introduce the PERSISTED keyword
for computed column to indicate that the field can be stored back to the
table, i.e. ts AS SYSTEMROWTIME() PERSISTED.

3) SOURCE / SINK / BOTH
GRANT/INVOKE sounds like a more standard option than adding a property to
CREATE TABLE to manage the ACL/permission. The ACL can be stored somewhere
in a database, and allow/disallow access to a dynamic table depending on
whether it's a "INSERT INTO" or "SELECT".

I can volunteer to put the discussion as a FLIP.  I can try to summarize
the current discussion, and share edit permission with you to collaborate
on the documents. After we finalized the doc, we can publish it as a FLIP.
What do you think?

Shuyi



On Tue, Dec 11, 2018 at 9:13 AM Timo Walther  wrote:

> Hi all,
>
> thanks for summarizing the discussion @Shuyi. I think we need to include
> the "table update mode" problem as it might not be changed easily in the
> future. Regarding "support row/map/array data type", I don't see a
> problem why we should not support them now as the data types are already
> included in the runtime. The "support custom timestamp extractor" is
> solved by the computed columns approach. The "custom watermark strategy"
> can be added by supplying a class name as paramter in my opinion.
>
> Regarding the comments of Lin and Jark:
>
> @Lin: Instantiating a TableSource/Sink should not cost much, but we
> should not mix catalog discussion and DDL at this point.
>
> 4) Event-Time Attributes and Watermarks
> 4.b) Regarding `ts AS SYSTEMROWTIME()` and Lin's comment about "will
> violate the rule": there is no explicit rule of doing so. Computed
> column are also not standard compliant, if we can use information that
> is encoded in constraints we should use it. Adding more and more
> top-level properties makes the interaction with connectors more
> difficult. An additional HEADER keyword sounds too connector-specific
> and also not SQL compliant to me.
>
> 3) SOURCE / SINK / BOTH
> GRANT/INVOKE are mutating an existing table, right? In my opinion,
> independent of SQL databases but focusing on Flink user requirements, a
> CREATE TABLE statement should be an immutable definition of a connection
> to an external system.
>
> 7) Table Update Mode
> As far as I can see, the only thing missing for enabling all table modes
> is the declaration of a change flag. We could introduce a new keyword
> here similar to WATERMARK:
>
> CREATE TABLE output_kafka_t1(
>id bigint,
>msg varchar,
>CHANGE_FLAG FOR isRetraction
> ) WITH (
>type=kafka
>,...
> );
>
> CREATE TABLE output_kafka_t1(
>CHANGE_FLAG FOR isUpsert
>id bigint,
>msg varchar,
>PRIMARY_KEY(id)
> ) WITH (
>type=kafka
>,...
> );
>
> What do you think?
>
> @Jark: We should definitely stage the discussions and mention the
> opinions and advantages/disadvantages that have been proposed already in
> the FLIP.
>
> Regards,
> Timo
>
> Am 10.12.18 um 08:10 schrieb Jark Wu:
> > Hi all,
> >
> > It's great to see we have an agreement on MVP.
> >
> > 4.b) Ingesting and writing timestamps to systems.
> > I would treat the field as a physical column not a virtual column. If we
> > treat it as computed column, it will be confused that the behavior is
> > different when it is a source or sink.
> > When it is a physical column, the behavior could be unified. Then the
> > problem is how to mapping from the field to kafka message timestamp?
> > One is Lin proposed above and is also used in KSQL[1]. Another idea is
> > introducing a HEADER column which strictly map by name to the fields in
> > message header.
> > For example,
> >
> > CREATE TABLE output_kafka_t1(
> >id bigint,
> >ts timestamp HEADER,
> >msg varchar
> > ) WITH (
> >type=kafka
> >,...
> > );
> >
> > This is used in Alibaba but not included in the DDL draft. It will
> further
> > extend the SQL syntax, which is we should be cautious about. What do you
> > think about this two solutions?
> >
> > 4.d) Custom watermark strategies:
> > @Timo,  I don't have a strong opinion on this.
> >
> > 3) SOURCE/SINK/BOTH
> > Agree with Lin, GRANT/INVOKE [SELECT|UPDATE] ON TABLE is a clean and
> > standard way to manage the permission, which is also adopted by HIVE

Re: [DISCUSS] Flink SQL DDL Design

2018-12-11 Thread Timo Walther
BINARY ]
}

computedColumnDefinition ::=
columnName AS computedColumnExpression

tableConstraint ::=
{ PRIMARY KEY | UNIQUE }
(columnName [, columnName]* )

tableIndex ::=
[ UNIQUE ] INDEX indexName
 (columnName [, columnName]* )

rowTimeColumn ::=
columnName

tableOption ::=
property=value
offset ::=
positive integer (unit: ms)

CREATE VIEW

CREATE VIEW viewName
  [
( columnName [, columnName]* )
  ]
AS queryStatement;

CREATE FUNCTION

 CREATE FUNCTION functionName
  AS 'className';

 className ::=
fully qualified name


Shuyi Chen  于2018年11月28日周三

上午3:28写道:

Thanks a lot, Timo and Xuefu. Yes, I think we can

finalize

the

design

doc

first and start implementation w/o the unified

connector

API

ready

by

skipping some featue.

Xuefu, I like the idea of making Flink specific

properties

into

generic

key-value pairs, so that it will make integration

with

Hive

DDL

(or

others,

e.g. Beam DDL) easier.

I'll run a final pass over the design doc and

finalize

the

design

in

the

next few days. And we can start creating tasks and

collaborate

on

the

implementation. Thanks a lot for all the comments

and

inputs.

Cheers!
Shuyi

On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <

xuef...@alibaba-inc.com>

wrote:


Yeah! I agree with Timo that DDL can actually

proceed

w/o

being

blocked

by

connector API. We can leave the unknown out while

defining

the

basic

syntax.

@Shuyi

As commented in the doc, I think we can probably

stick

with

simple

syntax

with general properties, without extending the

syntax

too

much

that

it

mimics the descriptor API.

Part of our effort on Flink-Hive integration is

also

to

make

DDL

syntax

compatible with Hive's. The one in the current

proposal

seems

making

our

effort more challenging.

We can help and collaborate. At this moment, I

think

we

can

finalize

on

the proposal and then we can divide the tasks for

better

collaboration.

Please let me know if there are  any questions or

suggestions.

Thanks,
Xuefu






--

Sender:Timo Walther 
Sent at:2018 Nov 27 (Tue) 16:21
Recipient:dev 
Subject:Re: [DISCUSS] Flink SQL DDL Design

Thanks for offering your help here, Xuefu. It

would

be

great

to

move

these efforts forward. I agree that the DDL is

somehow

releated

to

the

unified connector API design but we can also start

with

the

basic

functionality now and evolve the DDL during this

release

and

next

releases.

For example, we could identify the MVP DDL syntax

that

skips

defining

key constraints and maybe even time attributes.

This

DDL

could

be

used

for batch usecases, ETL, and materializing SQL

queries

(no

time

operations like windows).

The unified connector API is high on our priority

list

for

the

1.8

release. I will try to update the document until

mid

of

next

week.

Regards,

Timo


Am 27.11.18 um 08:08 schrieb Shuyi Chen:

Thanks a lot, Xuefu. I was busy for some other

stuff

for

the

last 2

weeks,

but we are definitely interested in moving this

forward.

I

think

once

the

unified connector API design [1] is done, we can

finalize

the

DDL

design

as

well and start creating concrete subtasks to

collaborate

on

the

implementation with the community.

Shuyi

[1]


https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <

xuef...@alibaba-inc.com>

wrote:


Hi Shuyi,

I'm wondering if you folks still have the

bandwidth

working

on

this.

We have some dedicated resource and like to move

this

forward.

We

can

collaborate.

Thanks,

Xuefu




--

发件人:wenlong.lwl
日 期:2018年11月05日 11:15:35
收件人:
主 题:Re: [DISCUSS] Flink SQL DDL Design

Hi, Shuyi, thanks for the proposal.

I have two concerns about the table ddl:

1. how about remove the source/sink mark from

the

ddl,

because

it

is

not

necessary, the framework determine the table

referred

is a

source

or a

sink

according to the context of the query using the

table.

it

will

be

more

convenient for use defining a table which can be

both

a

source

and

sink,

and more convenient for catalog to persistent

and

manage

the

meta

infos.

2. how about just keeping one pure string map as

parameters

for

table,

like

create tabe Kafka10SourceTable (
intField INTEGER,
stringField VARCHAR(128),
longField BIGINT,
rowTimeField TIMESTAMP
) with (
connector.type = ’kafka’,
connector.property-version = ’1’,
connector.version = ’0.10’,
connector.properties.topic = ‘test-kafka-topic’,
connector.properties.startup-mode =

‘latest-offset’,

connector.properties.specific-offset = ‘offset’,
format.type = 'json'
format.prpertie

Re: [DISCUSS] Flink SQL DDL Design

2018-12-09 Thread Jark Wu
Hi all,

It's great to see we have an agreement on MVP.

4.b) Ingesting and writing timestamps to systems.
I would treat the field as a physical column not a virtual column. If we
treat it as computed column, it will be confused that the behavior is
different when it is a source or sink.
When it is a physical column, the behavior could be unified. Then the
problem is how to mapping from the field to kafka message timestamp?
One is Lin proposed above and is also used in KSQL[1]. Another idea is
introducing a HEADER column which strictly map by name to the fields in
message header.
For example,

CREATE TABLE output_kafka_t1(
  id bigint,
  ts timestamp HEADER,
  msg varchar
) WITH (
  type=kafka
  ,...
);

This is used in Alibaba but not included in the DDL draft. It will further
extend the SQL syntax, which is we should be cautious about. What do you
think about this two solutions?

4.d) Custom watermark strategies:
@Timo,  I don't have a strong opinion on this.

3) SOURCE/SINK/BOTH
Agree with Lin, GRANT/INVOKE [SELECT|UPDATE] ON TABLE is a clean and
standard way to manage the permission, which is also adopted by HIVE[2] and
many databases.

[1]: https://docs.confluent.io/current/ksql/docs/tutorials/examples.html
[2]:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=45876173#Hivedeprecatedauthorizationmode/LegacyMode-Grant/RevokePrivileges

@Timo, it's great if someone can conclude the discussion and summarize into
a FLIP.
@Shuyi, Thanks a lot for putting it all together. The google doc looks good
to me, and I left some minor comments there.

Regarding to the FLIP, I have some suggestions:
1. The FLIP can contain MILESTONE1 and FUTURE WORKS.
2. The MILESTONE1 is the MVP. It describes the MVP DDL syntax.
3. Separate FUTURE WORKS into two parts: UNDER DISCUSSION and ADOPTED. We
can derive MILESTONE2 from this easily when it is ready.

I summarized the Future Works based on Shuyi's work:

Adopted: (Should detailed described here...)
1. support data type nullability and precision.
2. comment on table and columns.

Under Discussion: (Should briefly describe some options...)
1. Ingesting and writing timestamps to systems.
2. support custom watermark strategy.
3. support table update mode
4. support row/map/array data type
5. support schema derivation
6. support system versioned temporal table
7. support table index

We can continue the further discussion here, also can separate to an other
DISCUSS topic if it is a sophisticated problem such as Table Update Mode,
Temporal Table.

Best,
Jark

On Mon, 10 Dec 2018 at 11:54, Lin Li  wrote:

> hi all,
> Thanks for your valuable input!
>
> 4) Event-Time Attributes and Watermarks:
> 4.b) @Fabian As you mentioned using a computed columns `ts AS
> SYSTEMROWTIME()`
> for writing out to kafka table sink will violate the rule that computed
> fields are not emitted.
> Since the timestamp column in kafka's header area is a specific
> materialization protocol,
> why don't we treat it as an connector property? For an example:
> ```
> CREATE TABLE output_kafka_t1(
>   id bigint,
>   ts timestamp,
>   msg varchar
> ) WITH (
>   type=kafka,
>   header.timestamp=ts
>   ,...
> );
> ```
>
> 4d) For custom watermark strategies
> @Fabian Agree with you that opening another topic about this feature later.
>
> 3) SOURCE / SINK / BOTH
> I think the permissions and availabilities are two separately things,
> permissions
> can be managed well by using GRANT/INVOKE(you can call it DCL) solutions
> which
> commonly used in different DBs. The permission part can be an new topic for
> later discussion, what do you think?
>
> For the availabilities, @Fabian @Timo  I've another question,
> does instantiate a TableSource/Sink cost much or has some other downsides?
> IMO, create a new source/sink object via the construct seems not costly.
> When receiving a DDL we should associate it with the catalog object
> (reusing an existence or create a new one).
> Am I lost something important?
>
> 5. Schema declaration:
> @Timo  yes, your concern about the user convenience is very important. But
> I haven't seen a clear way to solve this so far.
> Do we put it later and wait for more inputs from the community?
>
> Shuyi Chen  于2018年12月8日周六 下午4:27写道:
>
> > Hi all,
> >
> > Thanks a lot for the great discussion. I think we can continue the
> > discussion here while carving out a MVP so that the community can start
> > working on. Based on the discussion so far, I try to summarize what we
> will
> > do for the MVP:
> >
> > MVP
> >
> >1. support CREATE TABLE
> >2. support exisiting data type in Flink SQL, ignore nullability and
> >precision
> >3. support table comments and column comments
> >4. support table constraint PRIMARY KEY and UNIQUE
> >5. support table properties using key-value pairs
> >6. support partitioned by
> >7. support computed column
> >8. support from-field and from-source timestamp extractors
> >9. support PERIODIC-ASCENDING, PERIODIC-BOUN

Re: [DISCUSS] Flink SQL DDL Design

2018-12-09 Thread Lin Li
hi all,
Thanks for your valuable input!

4) Event-Time Attributes and Watermarks:
4.b) @Fabian As you mentioned using a computed columns `ts AS
SYSTEMROWTIME()`
for writing out to kafka table sink will violate the rule that computed
fields are not emitted.
Since the timestamp column in kafka's header area is a specific
materialization protocol,
why don't we treat it as an connector property? For an example:
```
CREATE TABLE output_kafka_t1(
  id bigint,
  ts timestamp,
  msg varchar
) WITH (
  type=kafka,
  header.timestamp=ts
  ,...
);
```

4d) For custom watermark strategies
@Fabian Agree with you that opening another topic about this feature later.

3) SOURCE / SINK / BOTH
I think the permissions and availabilities are two separately things,
permissions
can be managed well by using GRANT/INVOKE(you can call it DCL) solutions
which
commonly used in different DBs. The permission part can be an new topic for
later discussion, what do you think?

For the availabilities, @Fabian @Timo  I've another question,
does instantiate a TableSource/Sink cost much or has some other downsides?
IMO, create a new source/sink object via the construct seems not costly.
When receiving a DDL we should associate it with the catalog object
(reusing an existence or create a new one).
Am I lost something important?

5. Schema declaration:
@Timo  yes, your concern about the user convenience is very important. But
I haven't seen a clear way to solve this so far.
Do we put it later and wait for more inputs from the community?

Shuyi Chen  于2018年12月8日周六 下午4:27写道:

> Hi all,
>
> Thanks a lot for the great discussion. I think we can continue the
> discussion here while carving out a MVP so that the community can start
> working on. Based on the discussion so far, I try to summarize what we will
> do for the MVP:
>
> MVP
>
>1. support CREATE TABLE
>2. support exisiting data type in Flink SQL, ignore nullability and
>precision
>3. support table comments and column comments
>4. support table constraint PRIMARY KEY and UNIQUE
>5. support table properties using key-value pairs
>6. support partitioned by
>7. support computed column
>8. support from-field and from-source timestamp extractors
>9. support PERIODIC-ASCENDING, PERIODIC-BOUNDED, FROM-SOURCE watermark
>strategies.
>10. support a table property to allow explicit enforcement of
>read/write(source/sink) permission of a table
>
> I try to put up the DDL grammar (
>
> https://docs.google.com/document/d/1ug1-aVBSCxZQk58kR-yaK2ETCgL3zg0eDUVGCnW2V9E/edit?usp=sharing
> )
> based on the MVP features above and the previous design docs. Please take a
> look and comment on it.
>
>
> Also, I summarize the future Improvement on CREATE TABLE as the followings:
>
>1. support table update mode
>2. support data type nullability and precision
>3. support row/map/array data type
>4. support custom timestamp extractor and watermark strategy
>5. support schema derivation
>6. support system versioned temporal table
>7. support table index
>
> I suggest we first agree on the MVP feature list and the MVP grammar. And
> then we can either continue the discussion of the future improvements here,
> or create separate JIRAs for each item and discuss further in the JIRA.
> What do you guys think?
>
> Shuyi
>
> On Fri, Dec 7, 2018 at 7:54 AM Timo Walther  wrote:
>
> > Hi all,
> >
> > I think we are making good progress. Thanks for all the feedback so far.
> >
> > 3. Sources/Sinks:
> > It seems that I can not find supporters for explicit SOURCE/SINK
> > declaration so I'm fine with not using those keywords.
> > @Fabian: Maybe we don't haven have to change the TableFactory interface
> > but just provide some helper functions in the TableFactoryService. This
> > would solve the availability problem, but the permission problem would
> > still not be solved. If you are fine with it, we could introduce a
> > property instead?
> >
> > 5. Schema declaration:
> > @Lin: We should find an agreement on this as it requires changes to the
> > TableFactory interface. We should minimize changes to this interface
> > because it is user-facing. Especially, if format schema and table schema
> > differ, the need for such a functionality is very important. Our goal is
> > to connect to existing infrastructure. For example, if we are using Avro
> > and the existing Avro format has enums but Flink SQL does not support
> > enums, it would be helpful to let the Avro format derive a table schema.
> > Otherwise your need to declare both schemas which leads to CREATE TABLE
> > statements of 400 lines+.
> > I think the mentioned query:
> > CREATE TABLE (PRIMARY_KEY(a, c)) WITH (format.type = avro,
> > format.schema-file = "/my/avrofile.avsc")
> > is fine and should only be valid if the schema contains no non-computed
> > columns.
> >
> > 7. Table Update Mode:
> > After thinking about it again, I agree. The mode of the sinks can be
> > derived from the query and t

Re: [DISCUSS] Flink SQL DDL Design

2018-12-08 Thread Shuyi Chen
Hi all,

Thanks a lot for the great discussion. I think we can continue the
discussion here while carving out a MVP so that the community can start
working on. Based on the discussion so far, I try to summarize what we will
do for the MVP:

MVP

   1. support CREATE TABLE
   2. support exisiting data type in Flink SQL, ignore nullability and
   precision
   3. support table comments and column comments
   4. support table constraint PRIMARY KEY and UNIQUE
   5. support table properties using key-value pairs
   6. support partitioned by
   7. support computed column
   8. support from-field and from-source timestamp extractors
   9. support PERIODIC-ASCENDING, PERIODIC-BOUNDED, FROM-SOURCE watermark
   strategies.
   10. support a table property to allow explicit enforcement of
   read/write(source/sink) permission of a table

I try to put up the DDL grammar (
https://docs.google.com/document/d/1ug1-aVBSCxZQk58kR-yaK2ETCgL3zg0eDUVGCnW2V9E/edit?usp=sharing)
based on the MVP features above and the previous design docs. Please take a
look and comment on it.


Also, I summarize the future Improvement on CREATE TABLE as the followings:

   1. support table update mode
   2. support data type nullability and precision
   3. support row/map/array data type
   4. support custom timestamp extractor and watermark strategy
   5. support schema derivation
   6. support system versioned temporal table
   7. support table index

I suggest we first agree on the MVP feature list and the MVP grammar. And
then we can either continue the discussion of the future improvements here,
or create separate JIRAs for each item and discuss further in the JIRA.
What do you guys think?

Shuyi

On Fri, Dec 7, 2018 at 7:54 AM Timo Walther  wrote:

> Hi all,
>
> I think we are making good progress. Thanks for all the feedback so far.
>
> 3. Sources/Sinks:
> It seems that I can not find supporters for explicit SOURCE/SINK
> declaration so I'm fine with not using those keywords.
> @Fabian: Maybe we don't haven have to change the TableFactory interface
> but just provide some helper functions in the TableFactoryService. This
> would solve the availability problem, but the permission problem would
> still not be solved. If you are fine with it, we could introduce a
> property instead?
>
> 5. Schema declaration:
> @Lin: We should find an agreement on this as it requires changes to the
> TableFactory interface. We should minimize changes to this interface
> because it is user-facing. Especially, if format schema and table schema
> differ, the need for such a functionality is very important. Our goal is
> to connect to existing infrastructure. For example, if we are using Avro
> and the existing Avro format has enums but Flink SQL does not support
> enums, it would be helpful to let the Avro format derive a table schema.
> Otherwise your need to declare both schemas which leads to CREATE TABLE
> statements of 400 lines+.
> I think the mentioned query:
> CREATE TABLE (PRIMARY_KEY(a, c)) WITH (format.type = avro,
> format.schema-file = "/my/avrofile.avsc")
> is fine and should only be valid if the schema contains no non-computed
> columns.
>
> 7. Table Update Mode:
> After thinking about it again, I agree. The mode of the sinks can be
> derived from the query and the existence of a PRIMARY KEY declaration.
> But Fabian raised a very good point. How do we deal with sources? Shall
> we introduce a new keywords similar to WATERMARKS such that a
> upsert/retract flag is not part of the visible schema?
>
> 4a. How to mark a field as attribute?
> @Jark: Thanks for the explanation of the WATERMARK clause semantics.
> This is a nice way of marking existing fields. This sounds good to me.
>
> 4c) WATERMARK as constraint
> I'm fine with leaving the WATERMARK clause in the schema definition.
>
> 4d) Custom watermark strategies:
> I would already think about custom watermark strategies as the current
> descriptor design already supports this. ScalarFunction's don't work as
> a PeriodicWatermarkAssigner has different semantics. Why not simply
> entering the a full class name here as it is done in the current design?
>
> 4.b) Ingesting and writing timestamps to systems (like Kafka)
> @Fabian: Yes, your suggestion sounds good to me. This behavior would be
> similar to our current `timestamps: from-source` design.
>
> Once our discussion has found a conclusion, I would like to volunteer
> and summarize the outcome of this mailing thread. It nicely aligns with
> the update work on the connector improvements document (that I wanted to
> do anyway) and the ongoing external catalog discussion. Furthermore, I
> would also want to propose how to change existing interfaces by keeping
> the DDL, connector improvements, and external catalog support in mind.
> Would that be ok for you?
>
> Thanks,
> Timo
>
>
>
> Am 07.12.18 um 14:48 schrieb Fabian Hueske:
> > Hi all,
> >
> > Thanks for the discussion.
> > I'd like to share my point of view as well.
> >
> > 4) Event-Time Att

Re: [DISCUSS] Flink SQL DDL Design

2018-12-07 Thread Timo Walther
ROCTIME()

which

defines a

proctime field named “pt” in the schema.

Looking forward to working with you guys!

Best,
Jark Wu


Lin Li  于2018年11月28日周三 下午6:33写道:


@Shuyi
Thanks for the proposal!  We have a simple DDL

implementation

(extends

Calcite's parser) which been running for almost two

years

on

production

and

works well.
I think the most valued things we'd learned is keeping

simplicity

and

standard compliance.
Here's the approximate grammar, FYI
CREATE TABLE

CREATE TABLE tableName(
   columnDefinition [, columnDefinition]*
   [ computedColumnDefinition [,

computedColumnDefinition]*

]

   [ tableConstraint [, tableConstraint]* ]
   [ tableIndex [, tableIndex]* ]
   [ PERIOD FOR SYSTEM_TIME ]
   [ WATERMARK watermarkName FOR rowTimeColumn AS
withOffset(rowTimeColumn, offset) ] ) [ WITH (

tableOption

[

,

tableOption]* ) ] [ ; ]

columnDefinition ::=
   columnName dataType [ NOT NULL ]

dataType  ::=
   {
 [ VARCHAR ]
 | [ BOOLEAN ]
 | [ TINYINT ]
 | [ SMALLINT ]
 | [ INT ]
 | [ BIGINT ]
 | [ FLOAT ]
 | [ DECIMAL ]
 | [ DOUBLE ]
 | [ DATE ]
 | [ TIME ]
 | [ TIMESTAMP ]
 | [ VARBINARY ]
   }

computedColumnDefinition ::=
   columnName AS computedColumnExpression

tableConstraint ::=
   { PRIMARY KEY | UNIQUE }
   (columnName [, columnName]* )

tableIndex ::=
   [ UNIQUE ] INDEX indexName
(columnName [, columnName]* )

rowTimeColumn ::=
   columnName

tableOption ::=
   property=value
   offset ::=
   positive integer (unit: ms)

CREATE VIEW

CREATE VIEW viewName
 [
   ( columnName [, columnName]* )
 ]
   AS queryStatement;

CREATE FUNCTION

CREATE FUNCTION functionName
 AS 'className';

className ::=
   fully qualified name


Shuyi Chen  于2018年11月28日周三

上午3:28写道:

Thanks a lot, Timo and Xuefu. Yes, I think we can

finalize

the

design

doc

first and start implementation w/o the unified

connector

API

ready

by

skipping some featue.

Xuefu, I like the idea of making Flink specific

properties

into

generic

key-value pairs, so that it will make integration with

Hive

DDL

(or

others,

e.g. Beam DDL) easier.

I'll run a final pass over the design doc and finalize

the

design

in

the

next few days. And we can start creating tasks and

collaborate

on

the

implementation. Thanks a lot for all the comments and

inputs.

Cheers!
Shuyi

On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <

xuef...@alibaba-inc.com>

wrote:


Yeah! I agree with Timo that DDL can actually proceed

w/o

being

blocked

by

connector API. We can leave the unknown out while

defining

the

basic

syntax.

@Shuyi

As commented in the doc, I think we can probably

stick

with

simple

syntax

with general properties, without extending the syntax

too

much

that

it

mimics the descriptor API.

Part of our effort on Flink-Hive integration is also

to

make

DDL

syntax

compatible with Hive's. The one in the current

proposal

seems

making

our

effort more challenging.

We can help and collaborate. At this moment, I think

we

can

finalize

on

the proposal and then we can divide the tasks for

better

collaboration.

Please let me know if there are  any questions or

suggestions.

Thanks,
Xuefu






--

Sender:Timo Walther 
Sent at:2018 Nov 27 (Tue) 16:21
Recipient:dev 
Subject:Re: [DISCUSS] Flink SQL DDL Design

Thanks for offering your help here, Xuefu. It would

be

great

to

move

these efforts forward. I agree that the DDL is

somehow

releated

to

the

unified connector API design but we can also start

with

the

basic

functionality now and evolve the DDL during this

release

and

next

releases.

For example, we could identify the MVP DDL syntax

that

skips

defining

key constraints and maybe even time attributes. This

DDL

could

be

used

for batch usecases, ETL, and materializing SQL

queries

(no

time

operations like windows).

The unified connector API is high on our priority

list

for

the

1.8

release. I will try to update the document until mid

of

next

week.

Regards,

Timo


Am 27.11.18 um 08:08 schrieb Shuyi Chen:

Thanks a lot, Xuefu. I was busy for some other

stuff

for

the

last 2

weeks,

but we are definitely interested in moving this

forward.

I

think

once

the

unified connector API design [1] is done, we can

finalize

the

DDL

design

as

well and start creating concrete subtasks to

collaborate

on

the

implementation with the community.

Shuyi

[1]


https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <

xuef...@alibaba-inc.com>

wrote

Re: [DISCUSS] Flink SQL DDL Design

2018-12-07 Thread Fabian Hueske
Hi all,

Thanks for the discussion.
I'd like to share my point of view as well.

4) Event-Time Attributes and Watermarks:
4.a) I agree with Lin and Jark's proposal. Declaring a watermark on an
attribute declares it as an event-time attribute.
4.b) Ingesting and writing timestamps to systems (like Kafka). We could use
a special function like (ts AS SYSTEMROWTIME()). This function will
indicate that we read the timestamp directly from the system (and not the
data). We can also write the field back to the system when emitting the
table (violating the rule that computed fields are not emitted).
4c) I would treat WATERMARK similar to a PRIMARY KEY or UNIQUE KEY
constraint and therefore keep it in the schema definition.
4d) For custom watermark strategies, a simple expressions or
ScalarFunctions won't be sufficient. Sophisticated approaches could collect
histograms, etc. But I think we can leave that out for later.

3) SOURCE / SINK / BOTH
As you said, there are two things to consider here: permission and
availability of a TableSource/TableSink.
I think that neither should be a reason to add a keyword at such a
sensitive position.
However, I also see Timo's point that it would be good to know up-front how
a table can be used without trying to instantiate a TableSource/Sink for a
query.
Maybe we can extend the TableFactory such that it provides information
about which sources/sinks it can provide.

7. Table Update Mode
Something that we definitely need to consider is how tables are ingested,
i.e., append, retract or upsert.
Especially, since upsert and retraction need a meta-data column that
indicates whether an event is an insert (or upsert) or a delete change.
This column needs to be identified somehow, most likely as part of the
input format. Ideally, this column should not be part of the table schema
(as it would be always true).
Emitting tables is not so much of an issue as the properties of the table
tell use what to do (append-only/update, unique key y/n).

Best,
Fabian


Am Fr., 7. Dez. 2018 um 10:39 Uhr schrieb Jark Wu :

> Hi Timo,
>
> Thanks for your quickly feedback! Here are some of my thoughts:
>
> Append, upserts, retract mode on sinks is also a very complex problem. I
> think append/upserts/retract is the ability of a table, user do not need to
> specify a table is used for append or retraction or upsert. The query can
> choose which mode the sink is. If an unbounded groupby is inserted into an
> append sink (the sink only implements/supports append), an exception can be
> thrown. A more complex problem is, if we want to write retractions/upserts
> to Kafka, how to encode the change flag (add or retract/delete) on the
> table? Maybe we should propose some protocal for the change flag encoding,
> but I don't have a clear idea about this right now.
>
> 3. Sources/Sinks: The source/sink tag is similar to the
> append/upsert/retract problem. Besides source/sink, actully we have stream
> source, stream sink, batch source, batch sink, and the stream sink also
> include append/upsert/retract three modes. Should we put all the tags on
> the CREATE TABLE? IMO, the table's ability is defined by the table itself,
> user do not need to specify it. If it is only a readable table, an
> exception can be thrown when write to it. As the source/sink tag can be
> omitted in CREATE TABLE, could we skip it and only support CREATE TABLE in
> the first version, and add it back in the future when we really need it? It
> keeps API compatible and make sure the MVP is what we consider clearly.
>
> 4a. How to mark a field as attribute?
> The watermark definition includes two parts: use which field as time
> attribute and use what generate strategy.
> When we want to mark `ts` field as attribute: WATERMARK FOR `ts` AS OFFSET
> '5' SECOND.
> If we have a POJO{id, user, ts} field named "pojo", we can mark it like
> this: WATERMARK FOR pojo.ts AS OFFSET '5' SECOND
>
> 4b. timestamp write to Kafka message header
> Even though we can define multiple time attribute on a table, only one time
> attribute can be actived/used in a query (in a stream). When we enable
> `writeTiemstamp`, the only attribute actived in the stream will be write to
> Kafka message header. What I mean the timestmap in StreamRecord is the time
> attribute in the stream.
>
> 4c. Yes. We introduced the WATERMARK keyword similar to the INDEX, PRIMARY
> KEY keywords.
>
> @Timo, Do you have any other advice or questions on the watermark syntax ?
> For example, the builtin strategy name: "BOUNDED WITH OFFSET" VS "OFFSET"
> VS ...
>
>
> Cheers,
> Jark
>
> On Fri, 7 Dec 2018 at 17:13, Lin Li  wrote:
>
> > Hi Timo,
> > Thanks for your feedback, here's some thoughts of mine:
> >
> > 3. Sources/Sinks:
> > "Let's assume an interactive CLI session, people should be able to list
> all
> > source table and sink tables to know upfront if they can use an INSERT
> INTO
> > here or not."
> > This requirement can be simply resolved by a document that list all
> > supported source

Re: [DISCUSS] Flink SQL DDL Design

2018-12-07 Thread Jark Wu
derstand/comment/discuss
> > >>>>>>>>>>>> on your proposed DDL implementation.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Regards,
> > >>>>>>>>>>>> Shaoxuan
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Nov 28, 2018 at 7:39 PM Jark Wu 
> > >>>>>>> wrote:
> > >>>>>>>>>>>>> Hi Shuyi,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks for bringing up this discussion and the awesome
> > >>>>> work!
> > >>>>>> I
> > >>>>>>>> have
> > >>>>>>>>>>> left
> > >>>>>>>>>>>>> some comments in the doc.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I want to share something more about the watermark
> > >>>>> definition
> > >>>>>>>>> learned
> > >>>>>>>>>>>> from
> > >>>>>>>>>>>>> Alibaba.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  1.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  Table should be able to accept multiple watermark
> > >>>>>>> definition.
> > >>>>>>>>>>>>>  Because a table may have more than one rowtime field.
> > >>>>> For
> > >>>>>>>>> example,
> > >>>>>>>>>>> one
> > >>>>>>>>>>>>>  rowtime field is from existing field but missing in
> some
> > >>>>>>>>> records,
> > >>>>>>>>>>>>> another
> > >>>>>>>>>>>>>  is the ingestion timestamp in Kafka but not very
> > >>>>> accurate.
> > >>>>>>> In
> > >>>>>>>>> this
> > >>>>>>>>>>>> case,
> > >>>>>>>>>>>>>  user may define two rowtime fields with watermarks in
> > >>>>> the
> > >>>>>>>> Table
> > >>>>>>>>>> and
> > >>>>>>>>>>>>> choose
> > >>>>>>>>>>>>>  one in different situation.
> > >>>>>>>>>>>>>  2.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>  Watermark stragety always work with rowtime field
> > >>>>>> together.
> > >>>>>>>>>>>>> Based on the two points metioned above, I think we should
> > >>>>>>> combine
> > >>>>>>>>> the
> > >>>>>>>>>>>>> watermark strategy and rowtime field selection (i.e. which
> > >>>>>>>> existing
> > >>>>>>>>>>> field
> > >>>>>>>>>>>>> used to generate watermark) in one clause, so that we can
> > >>>>>>> define
> > >>>>>>>>>>> multiple
> > >>>>>>>>>>>>> watermarks in one Table.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Here I will share the watermark syntax used in Alibaba
> > >>>>>> (simply
> > >>>>>>>>>>> modified):
> > >>>>>>>>>>>>> watermarkDefinition:
> > >>>>>>>>>>>>> WATERMARK [watermarkName] FOR  AS
> > >>>>> wm_strategy
> > >>>>>>>>>>>>> wm_strategy:
> > >>>>>>>>>>>>> BOUNDED WITH OFFSET 'string' timeUnit
> > >>>>>>>>>>>>> |
&g

Re: [DISCUSS] Flink SQL DDL Design

2018-12-07 Thread Lin Li
t;>>>
> >>>>>>>>>>>>>  Table should be able to accept multiple watermark
> >>>>>>> definition.
> >>>>>>>>>>>>>  Because a table may have more than one rowtime field.
> >>>>> For
> >>>>>>>>> example,
> >>>>>>>>>>> one
> >>>>>>>>>>>>>  rowtime field is from existing field but missing in some
> >>>>>>>>> records,
> >>>>>>>>>>>>> another
> >>>>>>>>>>>>>  is the ingestion timestamp in Kafka but not very
> >>>>> accurate.
> >>>>>>> In
> >>>>>>>>> this
> >>>>>>>>>>>> case,
> >>>>>>>>>>>>>  user may define two rowtime fields with watermarks in
> >>>>> the
> >>>>>>>> Table
> >>>>>>>>>> and
> >>>>>>>>>>>>> choose
> >>>>>>>>>>>>>  one in different situation.
> >>>>>>>>>>>>>  2.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  Watermark stragety always work with rowtime field
> >>>>>> together.
> >>>>>>>>>>>>> Based on the two points metioned above, I think we should
> >>>>>>> combine
> >>>>>>>>> the
> >>>>>>>>>>>>> watermark strategy and rowtime field selection (i.e. which
> >>>>>>>> existing
> >>>>>>>>>>> field
> >>>>>>>>>>>>> used to generate watermark) in one clause, so that we can
> >>>>>>> define
> >>>>>>>>>>> multiple
> >>>>>>>>>>>>> watermarks in one Table.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Here I will share the watermark syntax used in Alibaba
> >>>>>> (simply
> >>>>>>>>>>> modified):
> >>>>>>>>>>>>> watermarkDefinition:
> >>>>>>>>>>>>> WATERMARK [watermarkName] FOR  AS
> >>>>> wm_strategy
> >>>>>>>>>>>>> wm_strategy:
> >>>>>>>>>>>>> BOUNDED WITH OFFSET 'string' timeUnit
> >>>>>>>>>>>>> |
> >>>>>>>>>>>>> ASCENDING
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The “WATERMARK” keyword starts a watermark definition. The
> >>>>>>> “FOR”
> >>>>>>>>>>> keyword
> >>>>>>>>>>>>> defines which existing field used to generate watermark,
> >>>>> this
> >>>>>>>> field
> >>>>>>>>>>>> should
> >>>>>>>>>>>>> already exist in the schema (we can use computed-column to
> >>>>>>> derive
> >>>>>>>>>> from
> >>>>>>>>>>>>> other fields). The “AS” keyword defines watermark strategy,
> >>>>>>> such
> >>>>>>>> as
> >>>>>>>>>>>> BOUNDED
> >>>>>>>>>>>>> WITH OFFSET (covers almost all the requirements) and
> >>>>>> ASCENDING.
> >>>>>>>>>>>>> When the expected rowtime field does not exist in the
> >>>>> schema,
> >>>>>>> we
> >>>>>>>>> can
> >>>>>>>>>>> use
> >>>>>>>>>>>>> computed-column syntax to derive it from other existing
> >>>>>> fields
> >>>>>>>>> using
> >>>>>>>>>>>>> built-in functions or user defined functions. So the
> >>>>>>>>>> rowtime/watermark
> >>>>>>>>>>>

Re: [DISCUSS] Flink SQL DDL Design

2018-12-06 Thread Timo Walther
to

collaborate

on

the

implementation with the community.

Shuyi

[1]


https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <

xuef...@alibaba-inc.com>

wrote:


Hi Shuyi,

I'm wondering if you folks still have the

bandwidth

working

on

this.

We have some dedicated resource and like to move

this

forward.

We

can

collaborate.

Thanks,

Xuefu




--

发件人:wenlong.lwl
日 期:2018年11月05日 11:15:35
收件人:
主 题:Re: [DISCUSS] Flink SQL DDL Design

Hi, Shuyi, thanks for the proposal.

I have two concerns about the table ddl:

1. how about remove the source/sink mark from the

ddl,

because

it

is

not

necessary, the framework determine the table

referred

is a

source

or a

sink

according to the context of the query using the

table.

it

will

be

more

convenient for use defining a table which can be

both

a

source

and

sink,

and more convenient for catalog to persistent and

manage

the

meta

infos.

2. how about just keeping one pure string map as

parameters

for

table,

like

create tabe Kafka10SourceTable (
intField INTEGER,
stringField VARCHAR(128),
longField BIGINT,
rowTimeField TIMESTAMP
) with (
connector.type = ’kafka’,
connector.property-version = ’1’,
connector.version = ’0.10’,
connector.properties.topic = ‘test-kafka-topic’,
connector.properties.startup-mode =

‘latest-offset’,

connector.properties.specific-offset = ‘offset’,
format.type = 'json'
format.prperties.version=’1’,
format.derive-schema = 'true'
);
Because:
1. in TableFactory, what user use is a string map

properties,

defining

parameters by string-map can be the closest way to

mapping

how

user

use

the

parameters.
2. The table descriptor can be extended by user,

like

what

is

done

in

Kafka

and Json, it means that the parameter keys in

connector

or

format

can

be

different in different implementation, we can not

restrict

the

key

in

a

specified set, so we need a map in connector scope

and a

map

in

connector.properties scope. why not just give

user a

single

map,

let

them

put parameters in a format they like, which is

also

the

simplest

way

to

implement DDL parser.
3. whether we can define a format clause or not,

depends

on

the

implementation of the connector, using different

clause

in

DDL

may

make

a

misunderstanding that we can combine the

connectors

with

arbitrary

formats,

which may not work actually.

On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <

wos...@gmail.com

wrote:

+1, Thanks for the proposal.

I guess this is a long-awaited change. This can

vastly

increase

the

functionalities of the SQL Client as it will be

possible

to

use

complex

extensions like for example those provided by

Apache

Bahir[1].

Best Regards,
Dom.

[1]
https://github.com/apache/bahir-flink

sob., 3 lis 2018 o 17:17 Rong Rong <

walter...@gmail.com>

napisał(a):

+1. Thanks for putting the proposal together

Shuyi.

DDL has been brought up in a couple of times

previously

[1,2].

Utilizing

DDL will definitely be a great extension to the

current

Flink

SQL

to

systematically support some of the previously

brought

up

features

such

as

[3]. And it will also be beneficial to see the

document

closely

aligned

with the previous discussion for unified SQL

connector

API

[4].

I also left a few comments on the doc. Looking

forward

to

the

alignment

with the other couple of efforts and

contributing

to

them!

Best,
Rong

[1]



http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E

[2]



http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E

[3]

https://issues.apache.org/jira/browse/FLINK-8003

[4]



http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E

On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <

bowenl...@gmail.com

wrote:

Thanks Shuyi!

I left some comments there. I think the design

of

SQL

DDL

and

Flink-Hive

integration/External catalog enhancements will

work

closely

with

each

other. Hope we are well aligned on the

directions

of

the

two

designs,

and I

look forward to working with you guys on both!

Bowen


On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <

suez1...@gmail.com

wrote:

Hi everyone,

SQL DDL support has been a long-time ask from

the

community.

Current

Flink

SQL support only DML (e.g. SELECT and INSERT

statements).

In

its

current

form, Flink SQL users still need to

define/create

table

sources

and

sinks

programmatically in Java/Scala. Also, in SQL

Client,

without

DDL

support,

the current implementation does not allow

dynamical

creation

of

table,

type

or functions with SQL, this adds friction for

its

adoption.


Re: [DISCUSS] Flink SQL DDL Design

2018-12-05 Thread Jark Wu
t;>>> choose
> > >>>>>>>>>> one in different situation.
> > >>>>>>>>>> 2.
> > >>>>>>>>>>
> > >>>>>>>>>> Watermark stragety always work with rowtime field
> > >>> together.
> > >>>>>>>>>> Based on the two points metioned above, I think we should
> > >>>> combine
> > >>>>>> the
> > >>>>>>>>>> watermark strategy and rowtime field selection (i.e. which
> > >>>>> existing
> > >>>>>>>> field
> > >>>>>>>>>> used to generate watermark) in one clause, so that we can
> > >>>> define
> > >>>>>>>> multiple
> > >>>>>>>>>> watermarks in one Table.
> > >>>>>>>>>>
> > >>>>>>>>>> Here I will share the watermark syntax used in Alibaba
> > >>> (simply
> > >>>>>>>> modified):
> > >>>>>>>>>> watermarkDefinition:
> > >>>>>>>>>> WATERMARK [watermarkName] FOR  AS
> > >> wm_strategy
> > >>>>>>>>>> wm_strategy:
> > >>>>>>>>>>BOUNDED WITH OFFSET 'string' timeUnit
> > >>>>>>>>>> |
> > >>>>>>>>>>ASCENDING
> > >>>>>>>>>>
> > >>>>>>>>>> The “WATERMARK” keyword starts a watermark definition. The
> > >>>> “FOR”
> > >>>>>>>> keyword
> > >>>>>>>>>> defines which existing field used to generate watermark,
> > >> this
> > >>>>> field
> > >>>>>>>>> should
> > >>>>>>>>>> already exist in the schema (we can use computed-column to
> > >>>> derive
> > >>>>>>> from
> > >>>>>>>>>> other fields). The “AS” keyword defines watermark strategy,
> > >>>> such
> > >>>>> as
> > >>>>>>>>> BOUNDED
> > >>>>>>>>>> WITH OFFSET (covers almost all the requirements) and
> > >>> ASCENDING.
> > >>>>>>>>>> When the expected rowtime field does not exist in the
> > >> schema,
> > >>>> we
> > >>>>>> can
> > >>>>>>>> use
> > >>>>>>>>>> computed-column syntax to derive it from other existing
> > >>> fields
> > >>>>>> using
> > >>>>>>>>>> built-in functions or user defined functions. So the
> > >>>>>>> rowtime/watermark
> > >>>>>>>>>> definition doesn’t need to care about “field-change”
> > >> strategy
> > >>>>>>>>>> (replace/add/from-field). And the proctime field definition
> > >>> can
> > >>>>>> also
> > >>>>>>> be
> > >>>>>>>>>> defined using computed-column. Such as pt as PROCTIME()
> > >> which
> > >>>>>>> defines a
> > >>>>>>>>>> proctime field named “pt” in the schema.
> > >>>>>>>>>>
> > >>>>>>>>>> Looking forward to working with you guys!
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Jark Wu
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Lin Li  于2018年11月28日周三 下午6:33写道:
> > >>>>>>>>>>
> > >>>>>>>>>>> @Shuyi
> > >>>>>>>>>>> Thanks for the proposal!  We have a simple DDL
> > >>> implementation
> > >>>>>>>> (extends
> > >>>>>>>>>>> Calcite's parser) which been running for almost two years
> > >>> on
> > >>>>>>>> production
> > >>>>>>>>>>

Re: [DISCUSS] Flink SQL DDL Design

2018-12-05 Thread Zhang, Xuefu
gt;>>>>>>ASCENDING
> >>>>>>>>>>
> >>>>>>>>>> The “WATERMARK” keyword starts a watermark definition. The
> >>>> “FOR”
> >>>>>>>> keyword
> >>>>>>>>>> defines which existing field used to generate watermark,
> >> this
> >>>>> field
> >>>>>>>>> should
> >>>>>>>>>> already exist in the schema (we can use computed-column to
> >>>> derive
> >>>>>>> from
> >>>>>>>>>> other fields). The “AS” keyword defines watermark strategy,
> >>>> such
> >>>>> as
> >>>>>>>>> BOUNDED
> >>>>>>>>>> WITH OFFSET (covers almost all the requirements) and
> >>> ASCENDING.
> >>>>>>>>>> When the expected rowtime field does not exist in the
> >> schema,
> >>>> we
> >>>>>> can
> >>>>>>>> use
> >>>>>>>>>> computed-column syntax to derive it from other existing
> >>> fields
> >>>>>> using
> >>>>>>>>>> built-in functions or user defined functions. So the
> >>>>>>> rowtime/watermark
> >>>>>>>>>> definition doesn’t need to care about “field-change”
> >> strategy
> >>>>>>>>>> (replace/add/from-field). And the proctime field definition
> >>> can
> >>>>>> also
> >>>>>>> be
> >>>>>>>>>> defined using computed-column. Such as pt as PROCTIME()
> >> which
> >>>>>>> defines a
> >>>>>>>>>> proctime field named “pt” in the schema.
> >>>>>>>>>>
> >>>>>>>>>> Looking forward to working with you guys!
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jark Wu
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Lin Li  于2018年11月28日周三 下午6:33写道:
> >>>>>>>>>>
> >>>>>>>>>>> @Shuyi
> >>>>>>>>>>> Thanks for the proposal!  We have a simple DDL
> >>> implementation
> >>>>>>>> (extends
> >>>>>>>>>>> Calcite's parser) which been running for almost two years
> >>> on
> >>>>>>>> production
> >>>>>>>>>> and
> >>>>>>>>>>> works well.
> >>>>>>>>>>> I think the most valued things we'd learned is keeping
> >>>>> simplicity
> >>>>>>> and
> >>>>>>>>>>> standard compliance.
> >>>>>>>>>>> Here's the approximate grammar, FYI
> >>>>>>>>>>> CREATE TABLE
> >>>>>>>>>>>
> >>>>>>>>>>> CREATE TABLE tableName(
> >>>>>>>>>>>  columnDefinition [, columnDefinition]*
> >>>>>>>>>>>      [ computedColumnDefinition [,
> >>>>> computedColumnDefinition]*
> >>>>>> ]
> >>>>>>>>>>>  [ tableConstraint [, tableConstraint]* ]
> >>>>>>>>>>>  [ tableIndex [, tableIndex]* ]
> >>>>>>>>>>>  [ PERIOD FOR SYSTEM_TIME ]
> >>>>>>>>>>>  [ WATERMARK watermarkName FOR rowTimeColumn AS
> >>>>>>>>>>> withOffset(rowTimeColumn, offset) ] ) [ WITH (
> >>>> tableOption
> >>>>> [
> >>>>>> ,
> >>>>>>>>>>> tableOption]* ) ] [ ; ]
> >>>>>>>>>>>
> >>>>>>>>>>> columnDefinition ::=
> >>>>>>>>>>>  columnName dataType [ NOT NULL ]
> >>>>>>>>>>>
> >>>>>>>>>>> dataType  ::=
> >>>>>>>>>>>  {
> >>>>>>>>>>&

Re: [DISCUSS] Flink SQL DDL Design

2018-12-05 Thread Lin Li
; >>>>>>> rowtime/watermark
> >>>>>>>>>> definition doesn’t need to care about “field-change”
> >> strategy
> >>>>>>>>>> (replace/add/from-field). And the proctime field definition
> >>> can
> >>>>>> also
> >>>>>>> be
> >>>>>>>>>> defined using computed-column. Such as pt as PROCTIME()
> >> which
> >>>>>>> defines a
> >>>>>>>>>> proctime field named “pt” in the schema.
> >>>>>>>>>>
> >>>>>>>>>> Looking forward to working with you guys!
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Jark Wu
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Lin Li  于2018年11月28日周三 下午6:33写道:
> >>>>>>>>>>
> >>>>>>>>>>> @Shuyi
> >>>>>>>>>>> Thanks for the proposal!  We have a simple DDL
> >>> implementation
> >>>>>>>> (extends
> >>>>>>>>>>> Calcite's parser) which been running for almost two years
> >>> on
> >>>>>>>> production
> >>>>>>>>>> and
> >>>>>>>>>>> works well.
> >>>>>>>>>>> I think the most valued things we'd learned is keeping
> >>>>> simplicity
> >>>>>>> and
> >>>>>>>>>>> standard compliance.
> >>>>>>>>>>> Here's the approximate grammar, FYI
> >>>>>>>>>>> CREATE TABLE
> >>>>>>>>>>>
> >>>>>>>>>>> CREATE TABLE tableName(
> >>>>>>>>>>>  columnDefinition [, columnDefinition]*
> >>>>>>>>>>>      [ computedColumnDefinition [,
> >>>>> computedColumnDefinition]*
> >>>>>> ]
> >>>>>>>>>>>  [ tableConstraint [, tableConstraint]* ]
> >>>>>>>>>>>  [ tableIndex [, tableIndex]* ]
> >>>>>>>>>>>  [ PERIOD FOR SYSTEM_TIME ]
> >>>>>>>>>>>  [ WATERMARK watermarkName FOR rowTimeColumn AS
> >>>>>>>>>>> withOffset(rowTimeColumn, offset) ] ) [ WITH (
> >>>> tableOption
> >>>>> [
> >>>>>> ,
> >>>>>>>>>>> tableOption]* ) ] [ ; ]
> >>>>>>>>>>>
> >>>>>>>>>>> columnDefinition ::=
> >>>>>>>>>>>  columnName dataType [ NOT NULL ]
> >>>>>>>>>>>
> >>>>>>>>>>> dataType  ::=
> >>>>>>>>>>>  {
> >>>>>>>>>>>[ VARCHAR ]
> >>>>>>>>>>>| [ BOOLEAN ]
> >>>>>>>>>>>| [ TINYINT ]
> >>>>>>>>>>>| [ SMALLINT ]
> >>>>>>>>>>>| [ INT ]
> >>>>>>>>>>>| [ BIGINT ]
> >>>>>>>>>>>| [ FLOAT ]
> >>>>>>>>>>>| [ DECIMAL ]
> >>>>>>>>>>>| [ DOUBLE ]
> >>>>>>>>>>>| [ DATE ]
> >>>>>>>>>>>| [ TIME ]
> >>>>>>>>>>>| [ TIMESTAMP ]
> >>>>>>>>>>>| [ VARBINARY ]
> >>>>>>>>>>>  }
> >>>>>>>>>>>
> >>>>>>>>>>> computedColumnDefinition ::=
> >>>>>>>>>>>  columnName AS computedColumnExpression
> >>>>>>>>>>>
> >>>>>>>>>>> tableConstraint ::=
> >>>>>>>>>>>  { PRIMARY KEY | UNIQUE }
> >>>>>>>>>>

Re: [DISCUSS] Flink SQL DDL Design

2018-12-05 Thread Timo Walther
:=
 columnName

tableOption ::=
 property=value
 offset ::=
 positive integer (unit: ms)

CREATE VIEW

CREATE VIEW viewName
   [
 ( columnName [, columnName]* )
   ]
 AS queryStatement;

CREATE FUNCTION

  CREATE FUNCTION functionName
   AS 'className';

  className ::=
 fully qualified name


Shuyi Chen  于2018年11月28日周三 上午3:28写道:


Thanks a lot, Timo and Xuefu. Yes, I think we can

finalize

the

design

doc

first and start implementation w/o the unified

connector

API

ready

by

skipping some featue.

Xuefu, I like the idea of making Flink specific

properties

into

generic

key-value pairs, so that it will make integration with

Hive

DDL

(or

others,

e.g. Beam DDL) easier.

I'll run a final pass over the design doc and finalize

the

design

in

the

next few days. And we can start creating tasks and

collaborate

on

the

implementation. Thanks a lot for all the comments and

inputs.

Cheers!
Shuyi

On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <

xuef...@alibaba-inc.com>

wrote:


Yeah! I agree with Timo that DDL can actually proceed

w/o

being

blocked

by

connector API. We can leave the unknown out while

defining

the

basic

syntax.

@Shuyi

As commented in the doc, I think we can probably

stick

with

simple

syntax

with general properties, without extending the syntax

too

much

that

it

mimics the descriptor API.

Part of our effort on Flink-Hive integration is also

to

make

DDL

syntax

compatible with Hive's. The one in the current

proposal

seems

making

our

effort more challenging.

We can help and collaborate. At this moment, I think

we

can

finalize

on

the proposal and then we can divide the tasks for

better

collaboration.

Please let me know if there are  any questions or

suggestions.

Thanks,
Xuefu






--

Sender:Timo Walther 
Sent at:2018 Nov 27 (Tue) 16:21
Recipient:dev 
Subject:Re: [DISCUSS] Flink SQL DDL Design

Thanks for offering your help here, Xuefu. It would

be

great

to

move

these efforts forward. I agree that the DDL is

somehow

releated

to

the

unified connector API design but we can also start

with

the

basic

functionality now and evolve the DDL during this

release

and

next

releases.

For example, we could identify the MVP DDL syntax

that

skips

defining

key constraints and maybe even time attributes. This

DDL

could

be

used

for batch usecases, ETL, and materializing SQL

queries

(no

time

operations like windows).

The unified connector API is high on our priority

list

for

the

1.8

release. I will try to update the document until mid

of

next

week.


Regards,

Timo


Am 27.11.18 um 08:08 schrieb Shuyi Chen:

Thanks a lot, Xuefu. I was busy for some other

stuff

for

the

last 2

weeks,

but we are definitely interested in moving this

forward.

I

think

once

the

unified connector API design [1] is done, we can

finalize

the

DDL

design

as

well and start creating concrete subtasks to

collaborate

on

the

implementation with the community.

Shuyi

[1]


https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <

xuef...@alibaba-inc.com>

wrote:


Hi Shuyi,

I'm wondering if you folks still have the

bandwidth

working

on

this.

We have some dedicated resource and like to move

this

forward.

We

can

collaborate.

Thanks,

Xuefu




--

发件人:wenlong.lwl
日 期:2018年11月05日 11:15:35
收件人:
主 题:Re: [DISCUSS] Flink SQL DDL Design

Hi, Shuyi, thanks for the proposal.

I have two concerns about the table ddl:

1. how about remove the source/sink mark from the

ddl,

because

it

is

not

necessary, the framework determine the table

referred

is a

source

or a

sink

according to the context of the query using the

table.

it

will

be

more

convenient for use defining a table which can be

both

a

source

and

sink,

and more convenient for catalog to persistent and

manage

the

meta

infos.

2. how about just keeping one pure string map as

parameters

for

table,

like

create tabe Kafka10SourceTable (
intField INTEGER,
stringField VARCHAR(128),
longField BIGINT,
rowTimeField TIMESTAMP
) with (
connector.type = ’kafka’,
connector.property-version = ’1’,
connector.version = ’0.10’,
connector.properties.topic = ‘test-kafka-topic’,
connector.properties.startup-mode =

‘latest-offset’,

connector.properties.specific-offset = ‘offset’,
format.type = 'json'
format.prperties.version=’1’,
format.derive-schema = 'true'
);
Because:
1. in TableFactory, what user use is a string map

properties,

defining

parameters by string-map can be the closest way to

mapping

how

user

use

the

parameters.
2. The table descriptor can be extended by user,

like

what

is

done

in

Kafka

and Json, it means 

Re: [DISCUSS] Flink SQL DDL Design

2018-12-05 Thread Jark Wu
efinition ::=
> > > > > > > > > > columnName dataType [ NOT NULL ]
> > > > > > > > > >
> > > > > > > > > > dataType  ::=
> > > > > > > > > > {
> > > > > > > > > >   [ VARCHAR ]
> > > > > > > > > >   | [ BOOLEAN ]
> > > > > > > > > >   | [ TINYINT ]
> > > > > > > > > >   | [ SMALLINT ]
> > > > > > > > > >   | [ INT ]
> > > > > > > > > >   | [ BIGINT ]
> > > > > > > > > >   | [ FLOAT ]
> > > > > > > > > >   | [ DECIMAL ]
> > > > > > > > > >   | [ DOUBLE ]
> > > > > > > > > >   | [ DATE ]
> > > > > > > > > >   | [ TIME ]
> > > > > > > > > >   | [ TIMESTAMP ]
> > > > > > > > > >   | [ VARBINARY ]
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > computedColumnDefinition ::=
> > > > > > > > > > columnName AS computedColumnExpression
> > > > > > > > > >
> > > > > > > > > > tableConstraint ::=
> > > > > > > > > > { PRIMARY KEY | UNIQUE }
> > > > > > > > > > (columnName [, columnName]* )
> > > > > > > > > >
> > > > > > > > > > tableIndex ::=
> > > > > > > > > > [ UNIQUE ] INDEX indexName
> > > > > > > > > >  (columnName [, columnName]* )
> > > > > > > > > >
> > > > > > > > > > rowTimeColumn ::=
> > > > > > > > > > columnName
> > > > > > > > > >
> > > > > > > > > > tableOption ::=
> > > > > > > > > > property=value
> > > > > > > > > > offset ::=
> > > > > > > > > > positive integer (unit: ms)
> > > > > > > > > >
> > > > > > > > > > CREATE VIEW
> > > > > > > > > >
> > > > > > > > > > CREATE VIEW viewName
> > > > > > > > > >   [
> > > > > > > > > > ( columnName [, columnName]* )
> > > > > > > > > >   ]
> > > > > > > > > > AS queryStatement;
> > > > > > > > > >
> > > > > > > > > > CREATE FUNCTION
> > > > > > > > > >
> > > > > > > > > >  CREATE FUNCTION functionName
> > > > > > > > > >   AS 'className';
> > > > > > > > > >
> > > > > > > > > >  className ::=
> > > > > > > > > > fully qualified name
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Shuyi Chen  于2018年11月28日周三 上午3:28写道:
> > > > > > > > > >
> > > > > > > > > > > Thanks a lot, Timo and Xuefu. Yes, I think we can
> > finalize
> > > > the
> > > > > > > design
> > > > > > > > > doc
> > > > > > > > > > > first and start implementation w/o the unified
> connector
> > > API
> > > > > > ready
> > > > > > > by
> > > > > > > > > > > skipping some featue.
> > > > > > > > > > >
> > > > > > > > > > > Xuefu, I like the idea of making Flink specific
> > properties
> > > > into
> > > > > > > > generic
> > > > > > > > > > > key-value pairs, so that it will make integration with
> > Hive
> > > > DDL
> > > > > > (or
> > > > > > > > > > others,
> > > > > > > > > > > e.g. Beam DDL) easi

Re: [DISCUSS] Flink SQL DDL Design

2018-12-05 Thread Shuyi Chen
gt; > > > Shuyi Chen  于2018年11月28日周三 上午3:28写道:
> > > > > > > > >
> > > > > > > > > > Thanks a lot, Timo and Xuefu. Yes, I think we can
> finalize
> > > the
> > > > > > design
> > > > > > > > doc
> > > > > > > > > > first and start implementation w/o the unified connector
> > API
> > > > > ready
> > > > > > by
> > > > > > > > > > skipping some featue.
> > > > > > > > > >
> > > > > > > > > > Xuefu, I like the idea of making Flink specific
> properties
> > > into
> > > > > > > generic
> > > > > > > > > > key-value pairs, so that it will make integration with
> Hive
> > > DDL
> > > > > (or
> > > > > > > > > others,
> > > > > > > > > > e.g. Beam DDL) easier.
> > > > > > > > > >
> > > > > > > > > > I'll run a final pass over the design doc and finalize
> the
> > > > design
> > > > > > in
> > > > > > > > the
> > > > > > > > > > next few days. And we can start creating tasks and
> > > collaborate
> > > > on
> > > > > > the
> > > > > > > > > > implementation. Thanks a lot for all the comments and
> > inputs.
> > > > > > > > > >
> > > > > > > > > > Cheers!
> > > > > > > > > > Shuyi
> > > > > > > > > >
> > > > > > > > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> > > > > > > xuef...@alibaba-inc.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Yeah! I agree with Timo that DDL can actually proceed
> w/o
> > > > being
> > > > > > > > blocked
> > > > > > > > > > by
> > > > > > > > > > > connector API. We can leave the unknown out while
> > defining
> > > > the
> > > > > > > basic
> > > > > > > > > > syntax.
> > > > > > > > > > >
> > > > > > > > > > > @Shuyi
> > > > > > > > > > >
> > > > > > > > > > > As commented in the doc, I think we can probably stick
> > with
> > > > > > simple
> > > > > > > > > syntax
> > > > > > > > > > > with general properties, without extending the syntax
> too
> > > > much
> > > > > > that
> > > > > > > > it
> > > > > > > > > > > mimics the descriptor API.
> > > > > > > > > > >
> > > > > > > > > > > Part of our effort on Flink-Hive integration is also to
> > > make
> > > > > DDL
> > > > > > > > syntax
> > > > > > > > > > > compatible with Hive's. The one in the current proposal
> > > seems
> > > > > > > making
> > > > > > > > > our
> > > > > > > > > > > effort more challenging.
> > > > > > > > > > >
> > > > > > > > > > > We can help and collaborate. At this moment, I think we
> > can
> > > > > > > finalize
> > > > > > > > on
> > > > > > > > > > > the proposal and then we can divide the tasks for
> better
> > > > > > > > collaboration.
> > > > > > > > > > >
> > > > > > > > > > > Please let me know if there are  any questions or
> > > > suggestions.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Xuefu
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > >

Re: [DISCUSS] Flink SQL DDL Design

2018-12-04 Thread Jark Wu
gt; > > > > > > columnName AS computedColumnExpression
> > > > > > > >
> > > > > > > > tableConstraint ::=
> > > > > > > > { PRIMARY KEY | UNIQUE }
> > > > > > > > (columnName [, columnName]* )
> > > > > > > >
> > > > > > > > tableIndex ::=
> > > > > > > > [ UNIQUE ] INDEX indexName
> > > > > > > >  (columnName [, columnName]* )
> > > > > > > >
> > > > > > > > rowTimeColumn ::=
> > > > > > > > columnName
> > > > > > > >
> > > > > > > > tableOption ::=
> > > > > > > > property=value
> > > > > > > > offset ::=
> > > > > > > > positive integer (unit: ms)
> > > > > > > >
> > > > > > > > CREATE VIEW
> > > > > > > >
> > > > > > > > CREATE VIEW viewName
> > > > > > > >   [
> > > > > > > > ( columnName [, columnName]* )
> > > > > > > >   ]
> > > > > > > > AS queryStatement;
> > > > > > > >
> > > > > > > > CREATE FUNCTION
> > > > > > > >
> > > > > > > >  CREATE FUNCTION functionName
> > > > > > > >   AS 'className';
> > > > > > > >
> > > > > > > >  className ::=
> > > > > > > > fully qualified name
> > > > > > > >
> > > > > > > >
> > > > > > > > Shuyi Chen  于2018年11月28日周三 上午3:28写道:
> > > > > > > >
> > > > > > > > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize
> > the
> > > > > design
> > > > > > > doc
> > > > > > > > > first and start implementation w/o the unified connector
> API
> > > > ready
> > > > > by
> > > > > > > > > skipping some featue.
> > > > > > > > >
> > > > > > > > > Xuefu, I like the idea of making Flink specific properties
> > into
> > > > > > generic
> > > > > > > > > key-value pairs, so that it will make integration with Hive
> > DDL
> > > > (or
> > > > > > > > others,
> > > > > > > > > e.g. Beam DDL) easier.
> > > > > > > > >
> > > > > > > > > I'll run a final pass over the design doc and finalize the
> > > design
> > > > > in
> > > > > > > the
> > > > > > > > > next few days. And we can start creating tasks and
> > collaborate
> > > on
> > > > > the
> > > > > > > > > implementation. Thanks a lot for all the comments and
> inputs.
> > > > > > > > >
> > > > > > > > > Cheers!
> > > > > > > > > Shuyi
> > > > > > > > >
> > > > > > > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> > > > > > xuef...@alibaba-inc.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Yeah! I agree with Timo that DDL can actually proceed w/o
> > > being
> > > > > > > blocked
> > > > > > > > > by
> > > > > > > > > > connector API. We can leave the unknown out while
> defining
> > > the
> > > > > > basic
> > > > > > > > > syntax.
> > > > > > > > > >
> > > > > > > > > > @Shuyi
> > > > > > > > > >
> > > > > > > > > > As commented in the doc, I think we can probably stick
> with
> > > > > simple
> > > > > > > > syntax
> > > > > > > > > > with general properties, without extending the syntax too
> > > much
> > > > > that
> > > > > > > it
> > > > > > > > > > mimics the descriptor A

Re: [DISCUSS] Flink SQL DDL Design

2018-12-04 Thread Shaoxuan Wang
>
> > > > > > > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize
> the
> > > > design
> > > > > > doc
> > > > > > > > first and start implementation w/o the unified connector API
> > > ready
> > > > by
> > > > > > > > skipping some featue.
> > > > > > > >
> > > > > > > > Xuefu, I like the idea of making Flink specific properties
> into
> > > > > generic
> > > > > > > > key-value pairs, so that it will make integration with Hive
> DDL
> > > (or
> > > > > > > others,
> > > > > > > > e.g. Beam DDL) easier.
> > > > > > > >
> > > > > > > > I'll run a final pass over the design doc and finalize the
> > design
> > > > in
> > > > > > the
> > > > > > > > next few days. And we can start creating tasks and
> collaborate
> > on
> > > > the
> > > > > > > > implementation. Thanks a lot for all the comments and inputs.
> > > > > > > >
> > > > > > > > Cheers!
> > > > > > > > Shuyi
> > > > > > > >
> > > > > > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> > > > > xuef...@alibaba-inc.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Yeah! I agree with Timo that DDL can actually proceed w/o
> > being
> > > > > > blocked
> > > > > > > > by
> > > > > > > > > connector API. We can leave the unknown out while defining
> > the
> > > > > basic
> > > > > > > > syntax.
> > > > > > > > >
> > > > > > > > > @Shuyi
> > > > > > > > >
> > > > > > > > > As commented in the doc, I think we can probably stick with
> > > > simple
> > > > > > > syntax
> > > > > > > > > with general properties, without extending the syntax too
> > much
> > > > that
> > > > > > it
> > > > > > > > > mimics the descriptor API.
> > > > > > > > >
> > > > > > > > > Part of our effort on Flink-Hive integration is also to
> make
> > > DDL
> > > > > > syntax
> > > > > > > > > compatible with Hive's. The one in the current proposal
> seems
> > > > > making
> > > > > > > our
> > > > > > > > > effort more challenging.
> > > > > > > > >
> > > > > > > > > We can help and collaborate. At this moment, I think we can
> > > > > finalize
> > > > > > on
> > > > > > > > > the proposal and then we can divide the tasks for better
> > > > > > collaboration.
> > > > > > > > >
> > > > > > > > > Please let me know if there are  any questions or
> > suggestions.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Xuefu
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > --
> > > > > > > > > Sender:Timo Walther 
> > > > > > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > > > > > Recipient:dev 
> > > > > > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > > > > > >
> > > > > > > > > Thanks for offering your help here, Xuefu. It would be
> great
> > to
> > > > > move
> > > > > > > > > these efforts forward. I agree that the DDL is somehow
> > releated
> > > > to
> > > > > > the
> > > > > > > > > unified connector API design but we can also start with the
> > > basic
> > > > > > > > > functionality now and evolve the DDL during this release
> a

Re: [DISCUSS] Flink SQL DDL Design

2018-12-04 Thread Jark Wu
; > As commented in the doc, I think we can probably stick with
> > > simple
> > > > > > syntax
> > > > > > > > with general properties, without extending the syntax too
> much
> > > that
> > > > > it
> > > > > > > > mimics the descriptor API.
> > > > > > > >
> > > > > > > > Part of our effort on Flink-Hive integration is also to make
> > DDL
> > > > > syntax
> > > > > > > > compatible with Hive's. The one in the current proposal seems
> > > > making
> > > > > > our
> > > > > > > > effort more challenging.
> > > > > > > >
> > > > > > > > We can help and collaborate. At this moment, I think we can
> > > > finalize
> > > > > on
> > > > > > > > the proposal and then we can divide the tasks for better
> > > > > collaboration.
> > > > > > > >
> > > > > > > > Please let me know if there are  any questions or
> suggestions.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Xuefu
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > --
> > > > > > > > Sender:Timo Walther 
> > > > > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > > > > Recipient:dev 
> > > > > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > > > > >
> > > > > > > > Thanks for offering your help here, Xuefu. It would be great
> to
> > > > move
> > > > > > > > these efforts forward. I agree that the DDL is somehow
> releated
> > > to
> > > > > the
> > > > > > > > unified connector API design but we can also start with the
> > basic
> > > > > > > > functionality now and evolve the DDL during this release and
> > next
> > > > > > > releases.
> > > > > > > >
> > > > > > > > For example, we could identify the MVP DDL syntax that skips
> > > > defining
> > > > > > > > key constraints and maybe even time attributes. This DDL
> could
> > be
> > > > > used
> > > > > > > > for batch usecases, ETL, and materializing SQL queries (no
> time
> > > > > > > > operations like windows).
> > > > > > > >
> > > > > > > > The unified connector API is high on our priority list for
> the
> > > 1.8
> > > > > > > > release. I will try to update the document until mid of next
> > > week.
> > > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Timo
> > > > > > > >
> > > > > > > >
> > > > > > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > > > > > Thanks a lot, Xuefu. I was busy for some other stuff for
> the
> > > > last 2
> > > > > > > > weeks,
> > > > > > > > > but we are definitely interested in moving this forward. I
> > > think
> > > > > once
> > > > > > > the
> > > > > > > > > unified connector API design [1] is done, we can finalize
> the
> > > DDL
> > > > > > > design
> > > > > > > > as
> > > > > > > > > well and start creating concrete subtasks to collaborate on
> > the
> > > > > > > > > implementation with the community.
> > > > > > > > >
> > > > > > > > > Shuyi
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > > > >

Re: [DISCUSS] Flink SQL DDL Design

2018-11-29 Thread Shaoxuan Wang
> > others,
> > > > > > e.g. Beam DDL) easier.
> > > > > >
> > > > > > I'll run a final pass over the design doc and finalize the design
> > in
> > > > the
> > > > > > next few days. And we can start creating tasks and collaborate on
> > the
> > > > > > implementation. Thanks a lot for all the comments and inputs.
> > > > > >
> > > > > > Cheers!
> > > > > > Shuyi
> > > > > >
> > > > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> > > xuef...@alibaba-inc.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> > > > blocked
> > > > > > by
> > > > > > > connector API. We can leave the unknown out while defining the
> > > basic
> > > > > > syntax.
> > > > > > >
> > > > > > > @Shuyi
> > > > > > >
> > > > > > > As commented in the doc, I think we can probably stick with
> > simple
> > > > > syntax
> > > > > > > with general properties, without extending the syntax too much
> > that
> > > > it
> > > > > > > mimics the descriptor API.
> > > > > > >
> > > > > > > Part of our effort on Flink-Hive integration is also to make
> DDL
> > > > syntax
> > > > > > > compatible with Hive's. The one in the current proposal seems
> > > making
> > > > > our
> > > > > > > effort more challenging.
> > > > > > >
> > > > > > > We can help and collaborate. At this moment, I think we can
> > > finalize
> > > > on
> > > > > > > the proposal and then we can divide the tasks for better
> > > > collaboration.
> > > > > > >
> > > > > > > Please let me know if there are  any questions or suggestions.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Xuefu
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > --
> > > > > > > Sender:Timo Walther 
> > > > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > > > Recipient:dev 
> > > > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > > > >
> > > > > > > Thanks for offering your help here, Xuefu. It would be great to
> > > move
> > > > > > > these efforts forward. I agree that the DDL is somehow releated
> > to
> > > > the
> > > > > > > unified connector API design but we can also start with the
> basic
> > > > > > > functionality now and evolve the DDL during this release and
> next
> > > > > > releases.
> > > > > > >
> > > > > > > For example, we could identify the MVP DDL syntax that skips
> > > defining
> > > > > > > key constraints and maybe even time attributes. This DDL could
> be
> > > > used
> > > > > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > > > > operations like windows).
> > > > > > >
> > > > > > > The unified connector API is high on our priority list for the
> > 1.8
> > > > > > > release. I will try to update the document until mid of next
> > week.
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Timo
> > > > > > >
> > > > > > >
> > > > > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the
> > > last 2
> > > > > > > weeks,
> > > > > > > > but we are definitely interested in moving this forward. I
> > think
> > > > once
> > > > > > the
> > > > > > > > unified connector API design [1] is done, we c

Re: [DISCUSS] Flink SQL DDL Design

2018-11-28 Thread Lin Li
w days. And we can start creating tasks and collaborate on
> > the
> > > > > > implementation. Thanks a lot for all the comments and inputs.
> > > > > >
> > > > > > Cheers!
> > > > > > Shuyi
> > > > > >
> > > > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> > > xuef...@alibaba-inc.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> > > > blocked
> > > > > > by
> > > > > > > connector API. We can leave the unknown out while defining the
> > > basic
> > > > > > syntax.
> > > > > > >
> > > > > > > @Shuyi
> > > > > > >
> > > > > > > As commented in the doc, I think we can probably stick with
> > simple
> > > > > syntax
> > > > > > > with general properties, without extending the syntax too much
> > that
> > > > it
> > > > > > > mimics the descriptor API.
> > > > > > >
> > > > > > > Part of our effort on Flink-Hive integration is also to make
> DDL
> > > > syntax
> > > > > > > compatible with Hive's. The one in the current proposal seems
> > > making
> > > > > our
> > > > > > > effort more challenging.
> > > > > > >
> > > > > > > We can help and collaborate. At this moment, I think we can
> > > finalize
> > > > on
> > > > > > > the proposal and then we can divide the tasks for better
> > > > collaboration.
> > > > > > >
> > > > > > > Please let me know if there are  any questions or suggestions.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Xuefu
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > --
> > > > > > > Sender:Timo Walther 
> > > > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > > > Recipient:dev 
> > > > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > > > >
> > > > > > > Thanks for offering your help here, Xuefu. It would be great to
> > > move
> > > > > > > these efforts forward. I agree that the DDL is somehow releated
> > to
> > > > the
> > > > > > > unified connector API design but we can also start with the
> basic
> > > > > > > functionality now and evolve the DDL during this release and
> next
> > > > > > releases.
> > > > > > >
> > > > > > > For example, we could identify the MVP DDL syntax that skips
> > > defining
> > > > > > > key constraints and maybe even time attributes. This DDL could
> be
> > > > used
> > > > > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > > > > operations like windows).
> > > > > > >
> > > > > > > The unified connector API is high on our priority list for the
> > 1.8
> > > > > > > release. I will try to update the document until mid of next
> > week.
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Timo
> > > > > > >
> > > > > > >
> > > > > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the
> > > last 2
> > > > > > > weeks,
> > > > > > > > but we are definitely interested in moving this forward. I
> > think
> > > > once
> > > > > > the
> > > > > > > > unified connector API design [1] is done, we can finalize the
> > DDL
> > > > > > design
> > > > > > > as
> > > > > > > > well and start creating concrete subtasks to collaborate on
> the
> > > > > > > > implementation with the community.
&

Re: [DISCUSS] Flink SQL DDL Design

2018-11-28 Thread Bowen Li
r the design doc and finalize the design
> > in
> > > > the
> > > > > > next few days. And we can start creating tasks and collaborate on
> > the
> > > > > > implementation. Thanks a lot for all the comments and inputs.
> > > > > >
> > > > > > Cheers!
> > > > > > Shuyi
> > > > > >
> > > > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> > > xuef...@alibaba-inc.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> > > > blocked
> > > > > > by
> > > > > > > connector API. We can leave the unknown out while defining the
> > > basic
> > > > > > syntax.
> > > > > > >
> > > > > > > @Shuyi
> > > > > > >
> > > > > > > As commented in the doc, I think we can probably stick with
> > simple
> > > > > syntax
> > > > > > > with general properties, without extending the syntax too much
> > that
> > > > it
> > > > > > > mimics the descriptor API.
> > > > > > >
> > > > > > > Part of our effort on Flink-Hive integration is also to make
> DDL
> > > > syntax
> > > > > > > compatible with Hive's. The one in the current proposal seems
> > > making
> > > > > our
> > > > > > > effort more challenging.
> > > > > > >
> > > > > > > We can help and collaborate. At this moment, I think we can
> > > finalize
> > > > on
> > > > > > > the proposal and then we can divide the tasks for better
> > > > collaboration.
> > > > > > >
> > > > > > > Please let me know if there are  any questions or suggestions.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Xuefu
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > --
> > > > > > > Sender:Timo Walther 
> > > > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > > > Recipient:dev 
> > > > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > > > >
> > > > > > > Thanks for offering your help here, Xuefu. It would be great to
> > > move
> > > > > > > these efforts forward. I agree that the DDL is somehow releated
> > to
> > > > the
> > > > > > > unified connector API design but we can also start with the
> basic
> > > > > > > functionality now and evolve the DDL during this release and
> next
> > > > > > releases.
> > > > > > >
> > > > > > > For example, we could identify the MVP DDL syntax that skips
> > > defining
> > > > > > > key constraints and maybe even time attributes. This DDL could
> be
> > > > used
> > > > > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > > > > operations like windows).
> > > > > > >
> > > > > > > The unified connector API is high on our priority list for the
> > 1.8
> > > > > > > release. I will try to update the document until mid of next
> > week.
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Timo
> > > > > > >
> > > > > > >
> > > > > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the
> > > last 2
> > > > > > > weeks,
> > > > > > > > but we are definitely interested in moving this forward. I
> > think
> > > > once
> > > > > > the
> > > > > > > > unified connector API design [1] is done, we can finalize the
> > DDL
> > > > > > design
> > > > > > > as
> > > > > > > > well and start creating concrete subtasks to collaborate on
>

Re: [DISCUSS] Flink SQL DDL Design

2018-11-28 Thread Zhang, Xuefu
n the current proposal seems
> > making
> > > > our
> > > > > > effort more challenging.
> > > > > >
> > > > > > We can help and collaborate. At this moment, I think we can
> > finalize
> > > on
> > > > > > the proposal and then we can divide the tasks for better
> > > collaboration.
> > > > > >
> > > > > > Please let me know if there are  any questions or suggestions.
> > > > > >
> > > > > > Thanks,
> > > > > > Xuefu
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> --------------------------
> > > > > > Sender:Timo Walther 
> > > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > > Recipient:dev 
> > > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > > >
> > > > > > Thanks for offering your help here, Xuefu. It would be great to
> > move
> > > > > > these efforts forward. I agree that the DDL is somehow releated
> to
> > > the
> > > > > > unified connector API design but we can also start with the basic
> > > > > > functionality now and evolve the DDL during this release and next
> > > > > releases.
> > > > > >
> > > > > > For example, we could identify the MVP DDL syntax that skips
> > defining
> > > > > > key constraints and maybe even time attributes. This DDL could be
> > > used
> > > > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > > > operations like windows).
> > > > > >
> > > > > > The unified connector API is high on our priority list for the
> 1.8
> > > > > > release. I will try to update the document until mid of next
> week.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Timo
> > > > > >
> > > > > >
> > > > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the
> > last 2
> > > > > > weeks,
> > > > > > > but we are definitely interested in moving this forward. I
> think
> > > once
> > > > > the
> > > > > > > unified connector API design [1] is done, we can finalize the
> DDL
> > > > > design
> > > > > > as
> > > > > > > well and start creating concrete subtasks to collaborate on the
> > > > > > > implementation with the community.
> > > > > > >
> > > > > > > Shuyi
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > > >
> > > > > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <
> > > > xuef...@alibaba-inc.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Shuyi,
> > > > > > >>
> > > > > > >> I'm wondering if you folks still have the bandwidth working on
> > > this.
> > > > > > >>
> > > > > > >> We have some dedicated resource and like to move this forward.
> > We
> > > > can
> > > > > > >> collaborate.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Xuefu
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > --
> > > > > > >> 发件人:wenlong.lwl
> > > > > > >> 日 期:2018年11月05日 11:15:35
> > > > > > >> 收件人:
> > > > > > >> 主 题:Re: [DISCUSS] Flink SQL DDL Design
> > > > > > >>
> > > > > > >> Hi, Shuyi, thanks for the proposal.
> > > > > > >>
> > > > &

Re: [DISCUSS] Flink SQL DDL Design

2018-11-28 Thread Shuyi Chen
collaborate. At this moment, I think we can
> > finalize
> > > on
> > > > > > the proposal and then we can divide the tasks for better
> > > collaboration.
> > > > > >
> > > > > > Please let me know if there are  any questions or suggestions.
> > > > > >
> > > > > > Thanks,
> > > > > > Xuefu
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> --------------------------
> > > > > > Sender:Timo Walther 
> > > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > > Recipient:dev 
> > > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > > >
> > > > > > Thanks for offering your help here, Xuefu. It would be great to
> > move
> > > > > > these efforts forward. I agree that the DDL is somehow releated
> to
> > > the
> > > > > > unified connector API design but we can also start with the basic
> > > > > > functionality now and evolve the DDL during this release and next
> > > > > releases.
> > > > > >
> > > > > > For example, we could identify the MVP DDL syntax that skips
> > defining
> > > > > > key constraints and maybe even time attributes. This DDL could be
> > > used
> > > > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > > > operations like windows).
> > > > > >
> > > > > > The unified connector API is high on our priority list for the
> 1.8
> > > > > > release. I will try to update the document until mid of next
> week.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Timo
> > > > > >
> > > > > >
> > > > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the
> > last 2
> > > > > > weeks,
> > > > > > > but we are definitely interested in moving this forward. I
> think
> > > once
> > > > > the
> > > > > > > unified connector API design [1] is done, we can finalize the
> DDL
> > > > > design
> > > > > > as
> > > > > > > well and start creating concrete subtasks to collaborate on the
> > > > > > > implementation with the community.
> > > > > > >
> > > > > > > Shuyi
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > > >
> > > > > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <
> > > > xuef...@alibaba-inc.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Shuyi,
> > > > > > >>
> > > > > > >> I'm wondering if you folks still have the bandwidth working on
> > > this.
> > > > > > >>
> > > > > > >> We have some dedicated resource and like to move this forward.
> > We
> > > > can
> > > > > > >> collaborate.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Xuefu
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > --
> > > > > > >> 发件人:wenlong.lwl
> > > > > > >> 日 期:2018年11月05日 11:15:35
> > > > > > >> 收件人:
> > > > > > >> 主 题:Re: [DISCUSS] Flink SQL DDL Design
> > > > > > >>
> > > > > > >> Hi, Shuyi, thanks for the proposal.
> > > > > > >>
> > > > > > >> I have two concerns about the table ddl:
> > > > > > >>
> > > > > > >> 1. how about remove the source/sink mark from the

Re: [DISCUSS] Flink SQL DDL Design

2018-11-28 Thread Jark Wu
; > > computedColumnDefinition ::=
> > > columnName AS computedColumnExpression
> > >
> > > tableConstraint ::=
> > > { PRIMARY KEY | UNIQUE }
> > > (columnName [, columnName]* )
> > >
> > > tableIndex ::=
> > > [ UNIQUE ] INDEX indexName
> > >  (columnName [, columnName]* )
> > >
> > > rowTimeColumn ::=
> > > columnName
> > >
> > > tableOption ::=
> > > property=value
> > > offset ::=
> > > positive integer (unit: ms)
> > >
> > > CREATE VIEW
> > >
> > > CREATE VIEW viewName
> > >   [
> > > ( columnName [, columnName]* )
> > >   ]
> > > AS queryStatement;
> > >
> > > CREATE FUNCTION
> > >
> > >  CREATE FUNCTION functionName
> > >   AS 'className';
> > >
> > >  className ::=
> > > fully qualified name
> > >
> > >
> > > Shuyi Chen  于2018年11月28日周三 上午3:28写道:
> > >
> > > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the design
> > doc
> > > > first and start implementation w/o the unified connector API ready by
> > > > skipping some featue.
> > > >
> > > > Xuefu, I like the idea of making Flink specific properties into
> generic
> > > > key-value pairs, so that it will make integration with Hive DDL (or
> > > others,
> > > > e.g. Beam DDL) easier.
> > > >
> > > > I'll run a final pass over the design doc and finalize the design in
> > the
> > > > next few days. And we can start creating tasks and collaborate on the
> > > > implementation. Thanks a lot for all the comments and inputs.
> > > >
> > > > Cheers!
> > > > Shuyi
> > > >
> > > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu <
> xuef...@alibaba-inc.com>
> > > > wrote:
> > > >
> > > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> > blocked
> > > > by
> > > > > connector API. We can leave the unknown out while defining the
> basic
> > > > syntax.
> > > > >
> > > > > @Shuyi
> > > > >
> > > > > As commented in the doc, I think we can probably stick with simple
> > > syntax
> > > > > with general properties, without extending the syntax too much that
> > it
> > > > > mimics the descriptor API.
> > > > >
> > > > > Part of our effort on Flink-Hive integration is also to make DDL
> > syntax
> > > > > compatible with Hive's. The one in the current proposal seems
> making
> > > our
> > > > > effort more challenging.
> > > > >
> > > > > We can help and collaborate. At this moment, I think we can
> finalize
> > on
> > > > > the proposal and then we can divide the tasks for better
> > collaboration.
> > > > >
> > > > > Please let me know if there are  any questions or suggestions.
> > > > >
> > > > > Thanks,
> > > > > Xuefu
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sender:Timo Walther 
> > > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > > Recipient:dev 
> > > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > > >
> > > > > Thanks for offering your help here, Xuefu. It would be great to
> move
> > > > > these efforts forward. I agree that the DDL is somehow releated to
> > the
> > > > > unified connector API design but we can also start with the basic
> > > > > functionality now and evolve the DDL during this release and next
> > > > releases.
> > > > >
> > > > > For example, we could identify the MVP DDL syntax that skips
> defining
> > > > > key constraints and maybe even time attributes. This DDL could be
> > used
> > > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > > operations like windows).
> > > > >
> > > > > The unified connector API is high on our priority list for the 1.8
> > > > > re

Re: [DISCUSS] Flink SQL DDL Design

2018-11-28 Thread Shaoxuan Wang
; > CREATE FUNCTION
> >
> >  CREATE FUNCTION functionName
> >   AS 'className';
> >
> >  className ::=
> > fully qualified name
> >
> >
> > Shuyi Chen  于2018年11月28日周三 上午3:28写道:
> >
> > > Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the design
> doc
> > > first and start implementation w/o the unified connector API ready by
> > > skipping some featue.
> > >
> > > Xuefu, I like the idea of making Flink specific properties into generic
> > > key-value pairs, so that it will make integration with Hive DDL (or
> > others,
> > > e.g. Beam DDL) easier.
> > >
> > > I'll run a final pass over the design doc and finalize the design in
> the
> > > next few days. And we can start creating tasks and collaborate on the
> > > implementation. Thanks a lot for all the comments and inputs.
> > >
> > > Cheers!
> > > Shuyi
> > >
> > > On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu 
> > > wrote:
> > >
> > > > Yeah! I agree with Timo that DDL can actually proceed w/o being
> blocked
> > > by
> > > > connector API. We can leave the unknown out while defining the basic
> > > syntax.
> > > >
> > > > @Shuyi
> > > >
> > > > As commented in the doc, I think we can probably stick with simple
> > syntax
> > > > with general properties, without extending the syntax too much that
> it
> > > > mimics the descriptor API.
> > > >
> > > > Part of our effort on Flink-Hive integration is also to make DDL
> syntax
> > > > compatible with Hive's. The one in the current proposal seems making
> > our
> > > > effort more challenging.
> > > >
> > > > We can help and collaborate. At this moment, I think we can finalize
> on
> > > > the proposal and then we can divide the tasks for better
> collaboration.
> > > >
> > > > Please let me know if there are  any questions or suggestions.
> > > >
> > > > Thanks,
> > > > Xuefu
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sender:Timo Walther 
> > > > Sent at:2018 Nov 27 (Tue) 16:21
> > > > Recipient:dev 
> > > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > > >
> > > > Thanks for offering your help here, Xuefu. It would be great to move
> > > > these efforts forward. I agree that the DDL is somehow releated to
> the
> > > > unified connector API design but we can also start with the basic
> > > > functionality now and evolve the DDL during this release and next
> > > releases.
> > > >
> > > > For example, we could identify the MVP DDL syntax that skips defining
> > > > key constraints and maybe even time attributes. This DDL could be
> used
> > > > for batch usecases, ETL, and materializing SQL queries (no time
> > > > operations like windows).
> > > >
> > > > The unified connector API is high on our priority list for the 1.8
> > > > release. I will try to update the document until mid of next week.
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Timo
> > > >
> > > >
> > > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > > Thanks a lot, Xuefu. I was busy for some other stuff for the last 2
> > > > weeks,
> > > > > but we are definitely interested in moving this forward. I think
> once
> > > the
> > > > > unified connector API design [1] is done, we can finalize the DDL
> > > design
> > > > as
> > > > > well and start creating concrete subtasks to collaborate on the
> > > > > implementation with the community.
> > > > >
> > > > > Shuyi
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > >
> > > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <
> > xuef...@alibaba-inc.com>
> > > > > wrote:
> > > > >
> > > > >> Hi Shuyi,
> > > > >>
> > > > >> I'm wondering if you fo

Re: [DISCUSS] Flink SQL DDL Design

2018-11-28 Thread Jark Wu
> >
> > > Yeah! I agree with Timo that DDL can actually proceed w/o being blocked
> > by
> > > connector API. We can leave the unknown out while defining the basic
> > syntax.
> > >
> > > @Shuyi
> > >
> > > As commented in the doc, I think we can probably stick with simple
> syntax
> > > with general properties, without extending the syntax too much that it
> > > mimics the descriptor API.
> > >
> > > Part of our effort on Flink-Hive integration is also to make DDL syntax
> > > compatible with Hive's. The one in the current proposal seems making
> our
> > > effort more challenging.
> > >
> > > We can help and collaborate. At this moment, I think we can finalize on
> > > the proposal and then we can divide the tasks for better collaboration.
> > >
> > > Please let me know if there are  any questions or suggestions.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > >
> > >
> > >
> > > --
> > > Sender:Timo Walther 
> > > Sent at:2018 Nov 27 (Tue) 16:21
> > > Recipient:dev 
> > > Subject:Re: [DISCUSS] Flink SQL DDL Design
> > >
> > > Thanks for offering your help here, Xuefu. It would be great to move
> > > these efforts forward. I agree that the DDL is somehow releated to the
> > > unified connector API design but we can also start with the basic
> > > functionality now and evolve the DDL during this release and next
> > releases.
> > >
> > > For example, we could identify the MVP DDL syntax that skips defining
> > > key constraints and maybe even time attributes. This DDL could be used
> > > for batch usecases, ETL, and materializing SQL queries (no time
> > > operations like windows).
> > >
> > > The unified connector API is high on our priority list for the 1.8
> > > release. I will try to update the document until mid of next week.
> > >
> > >
> > > Regards,
> > >
> > > Timo
> > >
> > >
> > > Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > > > Thanks a lot, Xuefu. I was busy for some other stuff for the last 2
> > > weeks,
> > > > but we are definitely interested in moving this forward. I think once
> > the
> > > > unified connector API design [1] is done, we can finalize the DDL
> > design
> > > as
> > > > well and start creating concrete subtasks to collaborate on the
> > > > implementation with the community.
> > > >
> > > > Shuyi
> > > >
> > > > [1]
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > >
> > > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu <
> xuef...@alibaba-inc.com>
> > > > wrote:
> > > >
> > > >> Hi Shuyi,
> > > >>
> > > >> I'm wondering if you folks still have the bandwidth working on this.
> > > >>
> > > >> We have some dedicated resource and like to move this forward. We
> can
> > > >> collaborate.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Xuefu
> > > >>
> > > >>
> > > >> --
> > > >> 发件人:wenlong.lwl
> > > >> 日 期:2018年11月05日 11:15:35
> > > >> 收件人:
> > > >> 主 题:Re: [DISCUSS] Flink SQL DDL Design
> > > >>
> > > >> Hi, Shuyi, thanks for the proposal.
> > > >>
> > > >> I have two concerns about the table ddl:
> > > >>
> > > >> 1. how about remove the source/sink mark from the ddl, because it is
> > not
> > > >> necessary, the framework determine the table referred is a source
> or a
> > > sink
> > > >> according to the context of the query using the table. it will be
> more
> > > >> convenient for use defining a table which can be both a source and
> > sink,
> > > >> and more convenient for catalog to persistent and manage the meta
> > infos.
> > > >>
> > > >> 2. how about just keeping one pure string map as parameters for
> table,
> > > like
> > > >> create tabe Kafka10SourceTable (
> > > >

Re: [DISCUSS] Flink SQL DDL Design

2018-11-28 Thread Lin Li
 on the
> > > implementation with the community.
> > >
> > > Shuyi
> > >
> > > [1]
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > >
> > > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu 
> > > wrote:
> > >
> > >> Hi Shuyi,
> > >>
> > >> I'm wondering if you folks still have the bandwidth working on this.
> > >>
> > >> We have some dedicated resource and like to move this forward. We can
> > >> collaborate.
> > >>
> > >> Thanks,
> > >>
> > >> Xuefu
> > >>
> > >>
> > >> --
> > >> 发件人:wenlong.lwl
> > >> 日 期:2018年11月05日 11:15:35
> > >> 收件人:
> > >> 主 题:Re: [DISCUSS] Flink SQL DDL Design
> > >>
> > >> Hi, Shuyi, thanks for the proposal.
> > >>
> > >> I have two concerns about the table ddl:
> > >>
> > >> 1. how about remove the source/sink mark from the ddl, because it is
> not
> > >> necessary, the framework determine the table referred is a source or a
> > sink
> > >> according to the context of the query using the table. it will be more
> > >> convenient for use defining a table which can be both a source and
> sink,
> > >> and more convenient for catalog to persistent and manage the meta
> infos.
> > >>
> > >> 2. how about just keeping one pure string map as parameters for table,
> > like
> > >> create tabe Kafka10SourceTable (
> > >> intField INTEGER,
> > >> stringField VARCHAR(128),
> > >> longField BIGINT,
> > >> rowTimeField TIMESTAMP
> > >> ) with (
> > >> connector.type = ’kafka’,
> > >> connector.property-version = ’1’,
> > >> connector.version = ’0.10’,
> > >> connector.properties.topic = ‘test-kafka-topic’,
> > >> connector.properties.startup-mode = ‘latest-offset’,
> > >> connector.properties.specific-offset = ‘offset’,
> > >> format.type = 'json'
> > >> format.prperties.version=’1’,
> > >> format.derive-schema = 'true'
> > >> );
> > >> Because:
> > >> 1. in TableFactory, what user use is a string map properties, defining
> > >> parameters by string-map can be the closest way to mapping how user
> use
> > the
> > >> parameters.
> > >> 2. The table descriptor can be extended by user, like what is done in
> > Kafka
> > >> and Json, it means that the parameter keys in connector or format can
> be
> > >> different in different implementation, we can not restrict the key in
> a
> > >> specified set, so we need a map in connector scope and a map in
> > >> connector.properties scope. why not just give user a single map, let
> > them
> > >> put parameters in a format they like, which is also the simplest way
> to
> > >> implement DDL parser.
> > >> 3. whether we can define a format clause or not, depends on the
> > >> implementation of the connector, using different clause in DDL may
> make
> > a
> > >> misunderstanding that we can combine the connectors with arbitrary
> > formats,
> > >> which may not work actually.
> > >>
> > >> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński 
> wrote:
> > >>
> > >>> +1, Thanks for the proposal.
> > >>>
> > >>> I guess this is a long-awaited change. This can vastly increase the
> > >>> functionalities of the SQL Client as it will be possible to use
> complex
> > >>> extensions like for example those provided by Apache Bahir[1].
> > >>>
> > >>> Best Regards,
> > >>> Dom.
> > >>>
> > >>> [1]
> > >>> https://github.com/apache/bahir-flink
> > >>>
> > >>> sob., 3 lis 2018 o 17:17 Rong Rong  napisał(a):
> > >>>
> > >>>> +1. Thanks for putting the proposal together Shuyi.
> > >>>>
> > >>>> DDL has been brought up in a couple of times previously [1,2].
> > >> Utilizing
> > >>>> DDL will definitely be a great extension to the current Flink SQL to
> > >>>> systematically support some of the previously brought up features
&

Re: [DISCUSS] Flink SQL DDL Design

2018-11-27 Thread Shuyi Chen
Thanks a lot, Timo and Xuefu. Yes, I think we can finalize the design doc
first and start implementation w/o the unified connector API ready by
skipping some featue.

Xuefu, I like the idea of making Flink specific properties into generic
key-value pairs, so that it will make integration with Hive DDL (or others,
e.g. Beam DDL) easier.

I'll run a final pass over the design doc and finalize the design in the
next few days. And we can start creating tasks and collaborate on the
implementation. Thanks a lot for all the comments and inputs.

Cheers!
Shuyi

On Tue, Nov 27, 2018 at 7:02 AM Zhang, Xuefu 
wrote:

> Yeah! I agree with Timo that DDL can actually proceed w/o being blocked by
> connector API. We can leave the unknown out while defining the basic syntax.
>
> @Shuyi
>
> As commented in the doc, I think we can probably stick with simple syntax
> with general properties, without extending the syntax too much that it
> mimics the descriptor API.
>
> Part of our effort on Flink-Hive integration is also to make DDL syntax
> compatible with Hive's. The one in the current proposal seems making our
> effort more challenging.
>
> We can help and collaborate. At this moment, I think we can finalize on
> the proposal and then we can divide the tasks for better collaboration.
>
> Please let me know if there are  any questions or suggestions.
>
> Thanks,
> Xuefu
>
>
>
>
> --
> Sender:Timo Walther 
> Sent at:2018 Nov 27 (Tue) 16:21
> Recipient:dev 
> Subject:Re: [DISCUSS] Flink SQL DDL Design
>
> Thanks for offering your help here, Xuefu. It would be great to move
> these efforts forward. I agree that the DDL is somehow releated to the
> unified connector API design but we can also start with the basic
> functionality now and evolve the DDL during this release and next releases.
>
> For example, we could identify the MVP DDL syntax that skips defining
> key constraints and maybe even time attributes. This DDL could be used
> for batch usecases, ETL, and materializing SQL queries (no time
> operations like windows).
>
> The unified connector API is high on our priority list for the 1.8
> release. I will try to update the document until mid of next week.
>
>
> Regards,
>
> Timo
>
>
> Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> > Thanks a lot, Xuefu. I was busy for some other stuff for the last 2
> weeks,
> > but we are definitely interested in moving this forward. I think once the
> > unified connector API design [1] is done, we can finalize the DDL design
> as
> > well and start creating concrete subtasks to collaborate on the
> > implementation with the community.
> >
> > Shuyi
> >
> > [1]
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> >
> > On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu 
> > wrote:
> >
> >> Hi Shuyi,
> >>
> >> I'm wondering if you folks still have the bandwidth working on this.
> >>
> >> We have some dedicated resource and like to move this forward. We can
> >> collaborate.
> >>
> >> Thanks,
> >>
> >> Xuefu
> >>
> >>
> >> --
> >> 发件人:wenlong.lwl
> >> 日 期:2018年11月05日 11:15:35
> >> 收件人:
> >> 主 题:Re: [DISCUSS] Flink SQL DDL Design
> >>
> >> Hi, Shuyi, thanks for the proposal.
> >>
> >> I have two concerns about the table ddl:
> >>
> >> 1. how about remove the source/sink mark from the ddl, because it is not
> >> necessary, the framework determine the table referred is a source or a
> sink
> >> according to the context of the query using the table. it will be more
> >> convenient for use defining a table which can be both a source and sink,
> >> and more convenient for catalog to persistent and manage the meta infos.
> >>
> >> 2. how about just keeping one pure string map as parameters for table,
> like
> >> create tabe Kafka10SourceTable (
> >> intField INTEGER,
> >> stringField VARCHAR(128),
> >> longField BIGINT,
> >> rowTimeField TIMESTAMP
> >> ) with (
> >> connector.type = ’kafka’,
> >> connector.property-version = ’1’,
> >> connector.version = ’0.10’,
> >> connector.properties.topic = ‘test-kafka-topic’,
> >> connector.properties.startup-mode = ‘latest-offset’,
> >> connector.properties.specific-offset = ‘offset’,
> >> format.type = 'json'
> >> format.prperties.vers

Re: [DISCUSS] Flink SQL DDL Design

2018-11-27 Thread Zhang, Xuefu
Yeah! I agree with Timo that DDL can actually proceed w/o being blocked by 
connector API. We can leave the unknown out while defining the basic syntax.

@Shuyi 

As commented in the doc, I think we can probably stick with simple syntax with 
general properties, without extending the syntax too much that it mimics the 
descriptor API. 

Part of our effort on Flink-Hive integration is also to make DDL syntax 
compatible with Hive's. The one in the current proposal seems making our effort 
more challenging.

We can help and collaborate. At this moment, I think we can finalize on the 
proposal and then we can divide the tasks for better collaboration.

Please let me know if there are  any questions or suggestions.

Thanks,
Xuefu




--
Sender:Timo Walther 
Sent at:2018 Nov 27 (Tue) 16:21
Recipient:dev 
Subject:Re: [DISCUSS] Flink SQL DDL Design

Thanks for offering your help here, Xuefu. It would be great to move 
these efforts forward. I agree that the DDL is somehow releated to the 
unified connector API design but we can also start with the basic 
functionality now and evolve the DDL during this release and next releases.

For example, we could identify the MVP DDL syntax that skips defining 
key constraints and maybe even time attributes. This DDL could be used 
for batch usecases, ETL, and materializing SQL queries (no time 
operations like windows).

The unified connector API is high on our priority list for the 1.8 
release. I will try to update the document until mid of next week.


Regards,

Timo


Am 27.11.18 um 08:08 schrieb Shuyi Chen:
> Thanks a lot, Xuefu. I was busy for some other stuff for the last 2 weeks,
> but we are definitely interested in moving this forward. I think once the
> unified connector API design [1] is done, we can finalize the DDL design as
> well and start creating concrete subtasks to collaborate on the
> implementation with the community.
>
> Shuyi
>
> [1]
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
>
> On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu 
> wrote:
>
>> Hi Shuyi,
>>
>> I'm wondering if you folks still have the bandwidth working on this.
>>
>> We have some dedicated resource and like to move this forward. We can
>> collaborate.
>>
>> Thanks,
>>
>> Xuefu
>>
>>
>> ----------------------
>> 发件人:wenlong.lwl
>> 日 期:2018年11月05日 11:15:35
>> 收件人:
>> 主 题:Re: [DISCUSS] Flink SQL DDL Design
>>
>> Hi, Shuyi, thanks for the proposal.
>>
>> I have two concerns about the table ddl:
>>
>> 1. how about remove the source/sink mark from the ddl, because it is not
>> necessary, the framework determine the table referred is a source or a sink
>> according to the context of the query using the table. it will be more
>> convenient for use defining a table which can be both a source and sink,
>> and more convenient for catalog to persistent and manage the meta infos.
>>
>> 2. how about just keeping one pure string map as parameters for table, like
>> create tabe Kafka10SourceTable (
>> intField INTEGER,
>> stringField VARCHAR(128),
>> longField BIGINT,
>> rowTimeField TIMESTAMP
>> ) with (
>> connector.type = ’kafka’,
>> connector.property-version = ’1’,
>> connector.version = ’0.10’,
>> connector.properties.topic = ‘test-kafka-topic’,
>> connector.properties.startup-mode = ‘latest-offset’,
>> connector.properties.specific-offset = ‘offset’,
>> format.type = 'json'
>> format.prperties.version=’1’,
>> format.derive-schema = 'true'
>> );
>> Because:
>> 1. in TableFactory, what user use is a string map properties, defining
>> parameters by string-map can be the closest way to mapping how user use the
>> parameters.
>> 2. The table descriptor can be extended by user, like what is done in Kafka
>> and Json, it means that the parameter keys in connector or format can be
>> different in different implementation, we can not restrict the key in a
>> specified set, so we need a map in connector scope and a map in
>> connector.properties scope. why not just give user a single map, let them
>> put parameters in a format they like, which is also the simplest way to
>> implement DDL parser.
>> 3. whether we can define a format clause or not, depends on the
>> implementation of the connector, using different clause in DDL may make a
>> misunderstanding that we can combine the connectors with arbitrary formats,
>> which may not work actually.
>>
>> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński  wrote:
>&

Re: [DISCUSS] Flink SQL DDL Design

2018-11-27 Thread Timo Walther
Thanks for offering your help here, Xuefu. It would be great to move 
these efforts forward. I agree that the DDL is somehow releated to the 
unified connector API design but we can also start with the basic 
functionality now and evolve the DDL during this release and next releases.


For example, we could identify the MVP DDL syntax that skips defining 
key constraints and maybe even time attributes. This DDL could be used 
for batch usecases, ETL, and materializing SQL queries (no time 
operations like windows).


The unified connector API is high on our priority list for the 1.8 
release. I will try to update the document until mid of next week.



Regards,

Timo


Am 27.11.18 um 08:08 schrieb Shuyi Chen:

Thanks a lot, Xuefu. I was busy for some other stuff for the last 2 weeks,
but we are definitely interested in moving this forward. I think once the
unified connector API design [1] is done, we can finalize the DDL design as
well and start creating concrete subtasks to collaborate on the
implementation with the community.

Shuyi

[1]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu 
wrote:


Hi Shuyi,

I'm wondering if you folks still have the bandwidth working on this.

We have some dedicated resource and like to move this forward. We can
collaborate.

Thanks,

Xuefu


--
发件人:wenlong.lwl
日 期:2018年11月05日 11:15:35
收件人:
主 题:Re: [DISCUSS] Flink SQL DDL Design

Hi, Shuyi, thanks for the proposal.

I have two concerns about the table ddl:

1. how about remove the source/sink mark from the ddl, because it is not
necessary, the framework determine the table referred is a source or a sink
according to the context of the query using the table. it will be more
convenient for use defining a table which can be both a source and sink,
and more convenient for catalog to persistent and manage the meta infos.

2. how about just keeping one pure string map as parameters for table, like
create tabe Kafka10SourceTable (
intField INTEGER,
stringField VARCHAR(128),
longField BIGINT,
rowTimeField TIMESTAMP
) with (
connector.type = ’kafka’,
connector.property-version = ’1’,
connector.version = ’0.10’,
connector.properties.topic = ‘test-kafka-topic’,
connector.properties.startup-mode = ‘latest-offset’,
connector.properties.specific-offset = ‘offset’,
format.type = 'json'
format.prperties.version=’1’,
format.derive-schema = 'true'
);
Because:
1. in TableFactory, what user use is a string map properties, defining
parameters by string-map can be the closest way to mapping how user use the
parameters.
2. The table descriptor can be extended by user, like what is done in Kafka
and Json, it means that the parameter keys in connector or format can be
different in different implementation, we can not restrict the key in a
specified set, so we need a map in connector scope and a map in
connector.properties scope. why not just give user a single map, let them
put parameters in a format they like, which is also the simplest way to
implement DDL parser.
3. whether we can define a format clause or not, depends on the
implementation of the connector, using different clause in DDL may make a
misunderstanding that we can combine the connectors with arbitrary formats,
which may not work actually.

On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński  wrote:


+1, Thanks for the proposal.

I guess this is a long-awaited change. This can vastly increase the
functionalities of the SQL Client as it will be possible to use complex
extensions like for example those provided by Apache Bahir[1].

Best Regards,
Dom.

[1]
https://github.com/apache/bahir-flink

sob., 3 lis 2018 o 17:17 Rong Rong  napisał(a):


+1. Thanks for putting the proposal together Shuyi.

DDL has been brought up in a couple of times previously [1,2].

Utilizing

DDL will definitely be a great extension to the current Flink SQL to
systematically support some of the previously brought up features such

as

[3]. And it will also be beneficial to see the document closely aligned
with the previous discussion for unified SQL connector API [4].

I also left a few comments on the doc. Looking forward to the alignment
with the other couple of efforts and contributing to them!

Best,
Rong

[1]



http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E

[2]



http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E

[3] https://issues.apache.org/jira/browse/FLINK-8003
[4]



http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E


On Fri, Nov 2, 2018 at 10:22 AM Bowen Li  wrote:


Thanks Shuyi!

I left some comments there. I think the design of SQL DDL and

Flink-Hive

integration/External catalog enhancemen

Re: [DISCUSS] Flink SQL DDL Design

2018-11-26 Thread Shuyi Chen
Thanks a lot, Xuefu. I was busy for some other stuff for the last 2 weeks,
but we are definitely interested in moving this forward. I think once the
unified connector API design [1] is done, we can finalize the DDL design as
well and start creating concrete subtasks to collaborate on the
implementation with the community.

Shuyi

[1]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing

On Mon, Nov 26, 2018 at 7:01 PM Zhang, Xuefu 
wrote:

> Hi Shuyi,
>
> I'm wondering if you folks still have the bandwidth working on this.
>
> We have some dedicated resource and like to move this forward. We can
> collaborate.
>
> Thanks,
>
> Xuefu
>
>
> --
> 发件人:wenlong.lwl
> 日 期:2018年11月05日 11:15:35
> 收件人:
> 主 题:Re: [DISCUSS] Flink SQL DDL Design
>
> Hi, Shuyi, thanks for the proposal.
>
> I have two concerns about the table ddl:
>
> 1. how about remove the source/sink mark from the ddl, because it is not
> necessary, the framework determine the table referred is a source or a sink
> according to the context of the query using the table. it will be more
> convenient for use defining a table which can be both a source and sink,
> and more convenient for catalog to persistent and manage the meta infos.
>
> 2. how about just keeping one pure string map as parameters for table, like
> create tabe Kafka10SourceTable (
> intField INTEGER,
> stringField VARCHAR(128),
> longField BIGINT,
> rowTimeField TIMESTAMP
> ) with (
> connector.type = ’kafka’,
> connector.property-version = ’1’,
> connector.version = ’0.10’,
> connector.properties.topic = ‘test-kafka-topic’,
> connector.properties.startup-mode = ‘latest-offset’,
> connector.properties.specific-offset = ‘offset’,
> format.type = 'json'
> format.prperties.version=’1’,
> format.derive-schema = 'true'
> );
> Because:
> 1. in TableFactory, what user use is a string map properties, defining
> parameters by string-map can be the closest way to mapping how user use the
> parameters.
> 2. The table descriptor can be extended by user, like what is done in Kafka
> and Json, it means that the parameter keys in connector or format can be
> different in different implementation, we can not restrict the key in a
> specified set, so we need a map in connector scope and a map in
> connector.properties scope. why not just give user a single map, let them
> put parameters in a format they like, which is also the simplest way to
> implement DDL parser.
> 3. whether we can define a format clause or not, depends on the
> implementation of the connector, using different clause in DDL may make a
> misunderstanding that we can combine the connectors with arbitrary formats,
> which may not work actually.
>
> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński  wrote:
>
> > +1, Thanks for the proposal.
> >
> > I guess this is a long-awaited change. This can vastly increase the
> > functionalities of the SQL Client as it will be possible to use complex
> > extensions like for example those provided by Apache Bahir[1].
> >
> > Best Regards,
> > Dom.
> >
> > [1]
> > https://github.com/apache/bahir-flink
> >
> > sob., 3 lis 2018 o 17:17 Rong Rong  napisał(a):
> >
> > > +1. Thanks for putting the proposal together Shuyi.
> > >
> > > DDL has been brought up in a couple of times previously [1,2].
> Utilizing
> > > DDL will definitely be a great extension to the current Flink SQL to
> > > systematically support some of the previously brought up features such
> as
> > > [3]. And it will also be beneficial to see the document closely aligned
> > > with the previous discussion for unified SQL connector API [4].
> > >
> > > I also left a few comments on the doc. Looking forward to the alignment
> > > with the other couple of efforts and contributing to them!
> > >
> > > Best,
> > > Rong
> > >
> > > [1]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > [2]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > >
> > > [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > [4]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E
> > >
> > >
> > > On Fri, N

Re: [DISCUSS] Flink SQL DDL Design

2018-11-26 Thread Shuyi Chen
Hi Wenlong, thanks a lot for the comments.

1) I agree we can infer the table type from the queries if the Flink job is
static. However, for SQL client cases, the query is adhoc, dynamic, and not
known beforehand. In such case, we might want to enforce the table open
mode at startup time, so users won't accidentally write to a Kafka topic
that is supposed to be written only by producers outside of the Flink world.
2) as in [1], currently,  format and connector are first class concept in
Flink table, and it's required by most table creations, so I think adding
specific keyword to it makes it more organized and readable. But I do agree
a flattened key-value pair makes it simpler for parser, and easier to
extend. So maybe something like the following make more sense:

CREATE SOURCE TABLE Kafka10SourceTable (

intField INTEGER,

stringField VARCHAR(128) COMMENT ‘User IP address’,

longField BIGINT,

rowTimeField TIMESTAMP

TIMESTAMPS FROM ‘longField’

WATERMARKS PERIODIC-BOUNDED WITH DELAY '60’

)

COMMENT ‘Kafka Source Table of topic user_ip_address’

CONNECTOR (

type = ’kafka’,

property-version = ’1’,
version = ’0.10’,
properties.topic = ‘test-kafka-topic’,
properties.startup-mode = ‘latest-offset’,
properties.specific-offset = ‘offset’

)

FORMAT (

format.type = 'json',

format.prperties.version=’1’,

format.derive-schema = 'true'

)

Shuyi

[1]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit#heading=h.41fd6rs7b3cf

On Sun, Nov 4, 2018 at 7:15 PM wenlong.lwl  wrote:

> Hi, Shuyi, thanks for the proposal.
>
> I have two concerns about the table ddl:
>
> 1. how about remove the source/sink mark from the ddl, because it is not
> necessary, the framework determine the table referred is a source or a sink
> according to the context of the query using the table. it will be more
> convenient for use defining a table which can be both a source and sink,
> and more convenient for catalog to persistent and manage the meta infos.
>
> 2. how about just keeping one pure string map as parameters for table, like
> create tabe Kafka10SourceTable (
> intField INTEGER,
> stringField VARCHAR(128),
> longField BIGINT,
> rowTimeField TIMESTAMP
> ) with (
> connector.type = ’kafka’,
> connector.property-version = ’1’,
> connector.version = ’0.10’,
> connector.properties.topic = ‘test-kafka-topic’,
> connector.properties.startup-mode = ‘latest-offset’,
> connector.properties.specific-offset = ‘offset’,
> format.type = 'json'
> format.prperties.version=’1’,
> format.derive-schema = 'true'
> );
> Because:
> 1. in TableFactory, what user use is a string map properties, defining
> parameters by string-map can be the closest way to mapping how user use the
> parameters.
> 2. The table descriptor can be extended by user, like what is done in Kafka
> and Json, it means that the parameter keys in connector or format can be
> different in different implementation, we can not restrict the key in a
> specified set, so we need a map in connector scope and a map in
> connector.properties scope. why not just give user a single map, let them
> put parameters in a format they like, which is also the simplest way to
> implement DDL parser.
> 3. whether we can define a format clause or not, depends on the
> implementation of the connector, using different clause in DDL may make a
> misunderstanding that we can combine the connectors with arbitrary formats,
> which may not work actually.
>
> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński  wrote:
>
> > +1,  Thanks for the proposal.
> >
> > I guess this is a long-awaited change. This can vastly increase the
> > functionalities of the SQL Client as it will be possible to use complex
> > extensions like for example those provided by Apache Bahir[1].
> >
> > Best Regards,
> > Dom.
> >
> > [1]
> > https://github.com/apache/bahir-flink
> >
> > sob., 3 lis 2018 o 17:17 Rong Rong  napisał(a):
> >
> > > +1. Thanks for putting the proposal together Shuyi.
> > >
> > > DDL has been brought up in a couple of times previously [1,2].
> Utilizing
> > > DDL will definitely be a great extension to the current Flink SQL to
> > > systematically support some of the previously brought up features such
> as
> > > [3]. And it will also be beneficial to see the document closely aligned
> > > with the previous discussion for unified SQL connector API [4].
> > >
> > > I also left a few comments on the doc. Looking forward to the alignment
> > > with the other couple of efforts and contributing to them!
> > >
> > > Best,
> > > Rong
> > >
> > > [1]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > [2]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > >
> > > [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > [4]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201

Re: [DISCUSS] Flink SQL DDL Design

2018-11-26 Thread Zhang, Xuefu
Hi Shuyi, 

I'm wondering if you folks still have the bandwidth working on this. 

We have some dedicated resource and like to move this forward. We can 
collaborate. 

Thanks, 

Xuefu 


--
发件人:wenlong.lwl
日 期:2018年11月05日 11:15:35
收件人:
主 题:Re: [DISCUSS] Flink SQL DDL Design

Hi, Shuyi, thanks for the proposal.

I have two concerns about the table ddl:

1. how about remove the source/sink mark from the ddl, because it is not
necessary, the framework determine the table referred is a source or a sink
according to the context of the query using the table. it will be more
convenient for use defining a table which can be both a source and sink,
and more convenient for catalog to persistent and manage the meta infos.

2. how about just keeping one pure string map as parameters for table, like
create tabe Kafka10SourceTable (
intField INTEGER,
stringField VARCHAR(128),
longField BIGINT,
rowTimeField TIMESTAMP
) with (
connector.type = ’kafka’,
connector.property-version = ’1’,
connector.version = ’0.10’,
connector.properties.topic = ‘test-kafka-topic’,
connector.properties.startup-mode = ‘latest-offset’,
connector.properties.specific-offset = ‘offset’,
format.type = 'json'
format.prperties.version=’1’,
format.derive-schema = 'true'
);
Because:
1. in TableFactory, what user use is a string map properties, defining
parameters by string-map can be the closest way to mapping how user use the
parameters.
2. The table descriptor can be extended by user, like what is done in Kafka
and Json, it means that the parameter keys in connector or format can be
different in different implementation, we can not restrict the key in a
specified set, so we need a map in connector scope and a map in
connector.properties scope. why not just give user a single map, let them
put parameters in a format they like, which is also the simplest way to
implement DDL parser.
3. whether we can define a format clause or not, depends on the
implementation of the connector, using different clause in DDL may make a
misunderstanding that we can combine the connectors with arbitrary formats,
which may not work actually.

On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński  wrote:

> +1, Thanks for the proposal.
>
> I guess this is a long-awaited change. This can vastly increase the
> functionalities of the SQL Client as it will be possible to use complex
> extensions like for example those provided by Apache Bahir[1].
>
> Best Regards,
> Dom.
>
> [1]
> https://github.com/apache/bahir-flink
>
> sob., 3 lis 2018 o 17:17 Rong Rong  napisał(a):
>
> > +1. Thanks for putting the proposal together Shuyi.
> >
> > DDL has been brought up in a couple of times previously [1,2]. Utilizing
> > DDL will definitely be a great extension to the current Flink SQL to
> > systematically support some of the previously brought up features such as
> > [3]. And it will also be beneficial to see the document closely aligned
> > with the previous discussion for unified SQL connector API [4].
> >
> > I also left a few comments on the doc. Looking forward to the alignment
> > with the other couple of efforts and contributing to them!
> >
> > Best,
> > Rong
> >
> > [1]
> >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > [2]
> >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> >
> > [3] https://issues.apache.org/jira/browse/FLINK-8003
> > [4]
> >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E
> >
> >
> > On Fri, Nov 2, 2018 at 10:22 AM Bowen Li  wrote:
> >
> > > Thanks Shuyi!
> > >
> > > I left some comments there. I think the design of SQL DDL and
> Flink-Hive
> > > integration/External catalog enhancements will work closely with each
> > > other. Hope we are well aligned on the directions of the two designs,
> > and I
> > > look forward to working with you guys on both!
> > >
> > > Bowen
> > >
> > >
> > > On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen  wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > SQL DDL support has been a long-time ask from the community. Current
> > > Flink
> > > > SQL support only DML (e.g. SELECT and INSERT statements). In its
> > current
> > > > form, Flink SQL users still need to define/create table sources and
> > sinks
> > > > programmatically in Java/Scala. Also, in SQL Client, without DDL
&g

Re: [DISCUSS] Flink SQL DDL Design

2018-11-04 Thread wenlong.lwl
Hi, Shuyi, thanks for the proposal.

I have two concerns about the table ddl:

1. how about remove the source/sink mark from the ddl, because it is not
necessary, the framework determine the table referred is a source or a sink
according to the context of the query using the table. it will be more
convenient for use defining a table which can be both a source and sink,
and more convenient for catalog to persistent and manage the meta infos.

2. how about just keeping one pure string map as parameters for table, like
create tabe Kafka10SourceTable (
intField INTEGER,
stringField VARCHAR(128),
longField BIGINT,
rowTimeField TIMESTAMP
) with (
connector.type = ’kafka’,
connector.property-version = ’1’,
connector.version = ’0.10’,
connector.properties.topic = ‘test-kafka-topic’,
connector.properties.startup-mode = ‘latest-offset’,
connector.properties.specific-offset = ‘offset’,
format.type = 'json'
format.prperties.version=’1’,
format.derive-schema = 'true'
);
Because:
1. in TableFactory, what user use is a string map properties, defining
parameters by string-map can be the closest way to mapping how user use the
parameters.
2. The table descriptor can be extended by user, like what is done in Kafka
and Json, it means that the parameter keys in connector or format can be
different in different implementation, we can not restrict the key in a
specified set, so we need a map in connector scope and a map in
connector.properties scope. why not just give user a single map, let them
put parameters in a format they like, which is also the simplest way to
implement DDL parser.
3. whether we can define a format clause or not, depends on the
implementation of the connector, using different clause in DDL may make a
misunderstanding that we can combine the connectors with arbitrary formats,
which may not work actually.

On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński  wrote:

> +1,  Thanks for the proposal.
>
> I guess this is a long-awaited change. This can vastly increase the
> functionalities of the SQL Client as it will be possible to use complex
> extensions like for example those provided by Apache Bahir[1].
>
> Best Regards,
> Dom.
>
> [1]
> https://github.com/apache/bahir-flink
>
> sob., 3 lis 2018 o 17:17 Rong Rong  napisał(a):
>
> > +1. Thanks for putting the proposal together Shuyi.
> >
> > DDL has been brought up in a couple of times previously [1,2]. Utilizing
> > DDL will definitely be a great extension to the current Flink SQL to
> > systematically support some of the previously brought up features such as
> > [3]. And it will also be beneficial to see the document closely aligned
> > with the previous discussion for unified SQL connector API [4].
> >
> > I also left a few comments on the doc. Looking forward to the alignment
> > with the other couple of efforts and contributing to them!
> >
> > Best,
> > Rong
> >
> > [1]
> >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > [2]
> >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> >
> > [3] https://issues.apache.org/jira/browse/FLINK-8003
> > [4]
> >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E
> >
> >
> > On Fri, Nov 2, 2018 at 10:22 AM Bowen Li  wrote:
> >
> > > Thanks Shuyi!
> > >
> > > I left some comments there. I think the design of SQL DDL and
> Flink-Hive
> > > integration/External catalog enhancements will work closely with each
> > > other. Hope we are well aligned on the directions of the two designs,
> > and I
> > > look forward to working with you guys on both!
> > >
> > > Bowen
> > >
> > >
> > > On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen  wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > SQL DDL support has been a long-time ask from the community. Current
> > > Flink
> > > > SQL support only DML (e.g. SELECT and INSERT statements). In its
> > current
> > > > form, Flink SQL users still need to define/create table sources and
> > sinks
> > > > programmatically in Java/Scala. Also, in SQL Client, without DDL
> > support,
> > > > the current implementation does not allow dynamical creation of
> table,
> > > type
> > > > or functions with SQL, this adds friction for its adoption.
> > > >
> > > > I drafted a design doc [1] with a few other community members that
> > > proposes
> > > > the design and implementation for adding DDL support in Flink. The
> > > initial
> > > > design considers DDL for table, view, type, library and function. It
> > will
> > > > be great to get feedback on the design from the community, and align
> > with
> > > > latest effort in unified SQL connector API  [2] and Flink Hive
> > > integration
> > > > [3].
> > > >
> > > > Any feedback is highly appreciated.
> > > >
> > > > Thanks
> > > > Shuyi Chen
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://docs.google.com/docum

Re: [DISCUSS] Flink SQL DDL Design

2018-11-04 Thread Dominik Wosiński
+1,  Thanks for the proposal.

I guess this is a long-awaited change. This can vastly increase the
functionalities of the SQL Client as it will be possible to use complex
extensions like for example those provided by Apache Bahir[1].

Best Regards,
Dom.

[1]
https://github.com/apache/bahir-flink

sob., 3 lis 2018 o 17:17 Rong Rong  napisał(a):

> +1. Thanks for putting the proposal together Shuyi.
>
> DDL has been brought up in a couple of times previously [1,2]. Utilizing
> DDL will definitely be a great extension to the current Flink SQL to
> systematically support some of the previously brought up features such as
> [3]. And it will also be beneficial to see the document closely aligned
> with the previous discussion for unified SQL connector API [4].
>
> I also left a few comments on the doc. Looking forward to the alignment
> with the other couple of efforts and contributing to them!
>
> Best,
> Rong
>
> [1]
>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> [2]
>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
>
> [3] https://issues.apache.org/jira/browse/FLINK-8003
> [4]
>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E
>
>
> On Fri, Nov 2, 2018 at 10:22 AM Bowen Li  wrote:
>
> > Thanks Shuyi!
> >
> > I left some comments there. I think the design of SQL DDL and Flink-Hive
> > integration/External catalog enhancements will work closely with each
> > other. Hope we are well aligned on the directions of the two designs,
> and I
> > look forward to working with you guys on both!
> >
> > Bowen
> >
> >
> > On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen  wrote:
> >
> > > Hi everyone,
> > >
> > > SQL DDL support has been a long-time ask from the community. Current
> > Flink
> > > SQL support only DML (e.g. SELECT and INSERT statements). In its
> current
> > > form, Flink SQL users still need to define/create table sources and
> sinks
> > > programmatically in Java/Scala. Also, in SQL Client, without DDL
> support,
> > > the current implementation does not allow dynamical creation of table,
> > type
> > > or functions with SQL, this adds friction for its adoption.
> > >
> > > I drafted a design doc [1] with a few other community members that
> > proposes
> > > the design and implementation for adding DDL support in Flink. The
> > initial
> > > design considers DDL for table, view, type, library and function. It
> will
> > > be great to get feedback on the design from the community, and align
> with
> > > latest effort in unified SQL connector API  [2] and Flink Hive
> > integration
> > > [3].
> > >
> > > Any feedback is highly appreciated.
> > >
> > > Thanks
> > > Shuyi Chen
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > [2]
> > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > [3]
> > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > --
> > > "So you have to trust that the dots will somehow connect in your
> future."
> > >
> >
>


Re: [DISCUSS] Flink SQL DDL Design

2018-11-03 Thread Rong Rong
+1. Thanks for putting the proposal together Shuyi.

DDL has been brought up in a couple of times previously [1,2]. Utilizing
DDL will definitely be a great extension to the current Flink SQL to
systematically support some of the previously brought up features such as
[3]. And it will also be beneficial to see the document closely aligned
with the previous discussion for unified SQL connector API [4].

I also left a few comments on the doc. Looking forward to the alignment
with the other couple of efforts and contributing to them!

Best,
Rong

[1]
http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
[2]
http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E

[3] https://issues.apache.org/jira/browse/FLINK-8003
[4]
http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E


On Fri, Nov 2, 2018 at 10:22 AM Bowen Li  wrote:

> Thanks Shuyi!
>
> I left some comments there. I think the design of SQL DDL and Flink-Hive
> integration/External catalog enhancements will work closely with each
> other. Hope we are well aligned on the directions of the two designs, and I
> look forward to working with you guys on both!
>
> Bowen
>
>
> On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen  wrote:
>
> > Hi everyone,
> >
> > SQL DDL support has been a long-time ask from the community. Current
> Flink
> > SQL support only DML (e.g. SELECT and INSERT statements). In its current
> > form, Flink SQL users still need to define/create table sources and sinks
> > programmatically in Java/Scala. Also, in SQL Client, without DDL support,
> > the current implementation does not allow dynamical creation of table,
> type
> > or functions with SQL, this adds friction for its adoption.
> >
> > I drafted a design doc [1] with a few other community members that
> proposes
> > the design and implementation for adding DDL support in Flink. The
> initial
> > design considers DDL for table, view, type, library and function. It will
> > be great to get feedback on the design from the community, and align with
> > latest effort in unified SQL connector API  [2] and Flink Hive
> integration
> > [3].
> >
> > Any feedback is highly appreciated.
> >
> > Thanks
> > Shuyi Chen
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > [2]
> >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > [3]
> >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>


Re: [DISCUSS] Flink SQL DDL Design

2018-11-02 Thread Bowen Li
Thanks Shuyi!

I left some comments there. I think the design of SQL DDL and Flink-Hive
integration/External catalog enhancements will work closely with each
other. Hope we are well aligned on the directions of the two designs, and I
look forward to working with you guys on both!

Bowen


On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen  wrote:

> Hi everyone,
>
> SQL DDL support has been a long-time ask from the community. Current Flink
> SQL support only DML (e.g. SELECT and INSERT statements). In its current
> form, Flink SQL users still need to define/create table sources and sinks
> programmatically in Java/Scala. Also, in SQL Client, without DDL support,
> the current implementation does not allow dynamical creation of table, type
> or functions with SQL, this adds friction for its adoption.
>
> I drafted a design doc [1] with a few other community members that proposes
> the design and implementation for adding DDL support in Flink. The initial
> design considers DDL for table, view, type, library and function. It will
> be great to get feedback on the design from the community, and align with
> latest effort in unified SQL connector API  [2] and Flink Hive integration
> [3].
>
> Any feedback is highly appreciated.
>
> Thanks
> Shuyi Chen
>
> [1]
>
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> [2]
>
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> [3]
>
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> --
> "So you have to trust that the dots will somehow connect in your future."
>