Re: How to contribute to Streaming Table API and StreamSQL

2016-10-13 Thread Fabian Hueske
Hi Juho,

Yes, FLINK-4557 is an umbrella issue for windowed aggregations (both for
stream and batch) for the Table API (not including SQL).
The features are described in FLIP-11 [1]. Note, there is currently a
discussion about certain aspects of the proposed syntax [2].
There is a pull request for GroupWindow aggregates on streams (not batch)
[3] that should soon be good to merge (parts might be adapted depending on
the outcome of the discussion).
This feature should come with the next minor Flink release (1.2.0).

Support for GroupWindows in SQL depends on Apache Calcite. The Calcite
community is currently working on adding the required keywords to the
parser, validator and internal representation [4].

Let me know, if you have further questions.

Best, Fabian

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/RE-DISCUSS-FLIP-11-Table-API-Stream-Aggregations-tp13990.html
[3] https://github.com/apache/flink/pull/2562
[4] https://issues.apache.org/jira/browse/CALCITE-1345

2016-10-13 9:51 GMT+02:00 Juho Autio :

> Hi Fabian!
>
> Is this the feature that will also add windowed aggregates to streaming
> SQL:
> https://issues.apache.org/jira/browse/FLINK-4557 (Table API Stream
> Aggregations)?
>
> You wrote:
>
> > However for the 1.2 release, it we plan to focus on the streaming
> > Table API and Stream SQL to add support for windowed aggregates and
> joins,
> > which corresponds to Task 7 and 9 in the design document.
>
> Is there any WIP implementation yet? I'd like to try it as soon as
> possible. Where can we track progress for Stream SQL windowed aggregates?
>
> (A little bit about our use case, if you're interested:
> In our company we enable stream aggregates declaratively on arbitrary JSON
> fields. User can choose an aggregate function, field to aggregate, group by
> fields, and filters. At the moment we use a custom ReduceFunction that
> accumulates the aggregates. Flink's upcoming Streaming SQL seems to answer
> our use case perfectly (especially the calcite sample query in
> https://flink.apache.org/news/2016/05/24/stream-sql.html). We would like
> to
> use that SQL instead of our custom reducer. Especially we want to switch to
> directly defining the user aggregates in that SQL syntax instead of the
> JSON configuration that we now have for the purpose.)
>
> Cheers,
> Juho
>
> On Fri, Jun 17, 2016 at 2:26 PM, Fabian Hueske  wrote:
>
> > If we want to have it in Stream SQL yes. Although we can also think about
> > extending the Calcite parser ourselves.
> > IMO, it makes sense to talk to them first, also to get more feedback on
> the
> > feature.
> >
> >
> > 2016-06-17 13:18 GMT+02:00 Jark Wu :
> >
> > > Hi Fabian,
> > >
> > > Yea, we can immediately start to work on non-windowed aggregates. But
> it
> > > seems that Calcite’s StreamSQL doesn’t support non-windowed aggregates
> > > (also not included in roadmap). So we may need to propose this function
> > > back to Calcite community?
> > >
> > > - Jark Wu
> > >
> > > > 在 2016年6月17日,下午5:41,Fabian Hueske  写道:
> > > >
> > > > Hi Jark Wu,
> > > >
> > > > I agree about the non-windowed aggregates. If there are actual use
> > cases
> > > > for this operator, we should definitely support it.
> > > > Since it does not depend on windows or time, we can immediately start
> > to
> > > > work on it. In principle, it should be rather easy to implement.
> > > > However, we have to check how well it integrates with the current
> state
> > > of
> > > > Calcite.
> > > >
> > > > I think forking off a feature branch is a good idea. We have done
> that
> > > > before (e.g., for porting the Table API on top of Calcite), but it is
> > not
> > > > so common in the Flink community.
> > > > So I would first send a note to the dev list and check that nobody
> > > objects.
> > > >
> > > > I think we can decouple the development of the Table API and SQL.
> > > Although
> > > > it is desirable to have the same feature set in both APIs, I would
> not
> > be
> > > > strict about it.
> > > > However, the Table API does also depend on Calcite because all Table
> > API
> > > > queries go through Calcite's logical plan representation and
> optimizer.
> > > By
> > > > decoupling the SQL and Table API feature development, we do not need
> to
> > > > wait for the SQL parser but still might still need certain features
> in
> > > the
> > > > logical plan or optimizer. I hope we can solve a lot with custom
> > RelNodes
> > > > and optimizer rules which should eventually be contributed back to
> > > Calcite.
> > > >
> > > > Best, Fabian
> > > >
> > > >
> > > > 2016-06-17 9:48 GMT+02:00 Jark Wu :
> > > >
> > > >> Hi Fabian,
> > > >>
> > > >> There are a lot of our business are using non-windowed aggregations.
> > And
> > > >> there is a little difference between 

Re: How to contribute to Streaming Table API and StreamSQL

2016-10-13 Thread Juho Autio
Hi Fabian!

Is this the feature that will also add windowed aggregates to streaming SQL:
https://issues.apache.org/jira/browse/FLINK-4557 (Table API Stream
Aggregations)?

You wrote:

> However for the 1.2 release, it we plan to focus on the streaming
> Table API and Stream SQL to add support for windowed aggregates and joins,
> which corresponds to Task 7 and 9 in the design document.

Is there any WIP implementation yet? I'd like to try it as soon as
possible. Where can we track progress for Stream SQL windowed aggregates?

(A little bit about our use case, if you're interested:
In our company we enable stream aggregates declaratively on arbitrary JSON
fields. User can choose an aggregate function, field to aggregate, group by
fields, and filters. At the moment we use a custom ReduceFunction that
accumulates the aggregates. Flink's upcoming Streaming SQL seems to answer
our use case perfectly (especially the calcite sample query in
https://flink.apache.org/news/2016/05/24/stream-sql.html). We would like to
use that SQL instead of our custom reducer. Especially we want to switch to
directly defining the user aggregates in that SQL syntax instead of the
JSON configuration that we now have for the purpose.)

Cheers,
Juho

On Fri, Jun 17, 2016 at 2:26 PM, Fabian Hueske  wrote:

> If we want to have it in Stream SQL yes. Although we can also think about
> extending the Calcite parser ourselves.
> IMO, it makes sense to talk to them first, also to get more feedback on the
> feature.
>
>
> 2016-06-17 13:18 GMT+02:00 Jark Wu :
>
> > Hi Fabian,
> >
> > Yea, we can immediately start to work on non-windowed aggregates. But it
> > seems that Calcite’s StreamSQL doesn’t support non-windowed aggregates
> > (also not included in roadmap). So we may need to propose this function
> > back to Calcite community?
> >
> > - Jark Wu
> >
> > > 在 2016年6月17日,下午5:41,Fabian Hueske  写道:
> > >
> > > Hi Jark Wu,
> > >
> > > I agree about the non-windowed aggregates. If there are actual use
> cases
> > > for this operator, we should definitely support it.
> > > Since it does not depend on windows or time, we can immediately start
> to
> > > work on it. In principle, it should be rather easy to implement.
> > > However, we have to check how well it integrates with the current state
> > of
> > > Calcite.
> > >
> > > I think forking off a feature branch is a good idea. We have done that
> > > before (e.g., for porting the Table API on top of Calcite), but it is
> not
> > > so common in the Flink community.
> > > So I would first send a note to the dev list and check that nobody
> > objects.
> > >
> > > I think we can decouple the development of the Table API and SQL.
> > Although
> > > it is desirable to have the same feature set in both APIs, I would not
> be
> > > strict about it.
> > > However, the Table API does also depend on Calcite because all Table
> API
> > > queries go through Calcite's logical plan representation and optimizer.
> > By
> > > decoupling the SQL and Table API feature development, we do not need to
> > > wait for the SQL parser but still might still need certain features in
> > the
> > > logical plan or optimizer. I hope we can solve a lot with custom
> RelNodes
> > > and optimizer rules which should eventually be contributed back to
> > Calcite.
> > >
> > > Best, Fabian
> > >
> > >
> > > 2016-06-17 9:48 GMT+02:00 Jark Wu :
> > >
> > >> Hi Fabian,
> > >>
> > >> There are a lot of our business are using non-windowed aggregations.
> And
> > >> there is a little difference between non-windowed aggregate and Row
> > window
> > >> operator, as the later is bound to a certain window and emit the
> result
> > of
> > >> the N rows preceding for every incoming row. However the former emit
> the
> > >> aggregate result of the whole elements. So I suggest to add them for
> > more
> > >> complete semantic.
> > >>
> > >> Regarding the windowed aggregate task, I’m agree with that and I'm
> > looking
> > >> forward as soon as possible to see the corresponding JIRA issues
> > created.
> > >> After that, we can start working on an independent branch without
> > waiting
> > >> for 1.1 released. But I’m still a little concerned about Calcite’s
> > support,
> > >> as we must waiting for Calcite supporting correspond syntax and the
> > >> version released. If we can separate the task into Table API and SQL
> ,
> > we
> > >> may not be blocked by Calcite too much.
> > >>
> > >> What do you think?
> > >>
> > >> - Jark Wu
> > >>
> > >>> 在 2016年6月16日,下午8:37,Fabian Hueske  写道:
> > >>>
> > >>> Hi Jark,
> > >>>
> > >>> thanks for sharing Blink's Streaming Table API. It seems to be close
> to
> > >> the
> > >>> DataStream API, while the Table API draft I shared is more similar to
> > >>> Calcite's proposal.
> > >>> You are right, the current draft does not include running
> > (non-windowed)
> > >>> aggregates. We were not sure how 

Re: How to contribute to Streaming Table API and StreamSQL

2016-06-17 Thread Fabian Hueske
If we want to have it in Stream SQL yes. Although we can also think about
extending the Calcite parser ourselves.
IMO, it makes sense to talk to them first, also to get more feedback on the
feature.


2016-06-17 13:18 GMT+02:00 Jark Wu :

> Hi Fabian,
>
> Yea, we can immediately start to work on non-windowed aggregates. But it
> seems that Calcite’s StreamSQL doesn’t support non-windowed aggregates
> (also not included in roadmap). So we may need to propose this function
> back to Calcite community?
>
> - Jark Wu
>
> > 在 2016年6月17日,下午5:41,Fabian Hueske  写道:
> >
> > Hi Jark Wu,
> >
> > I agree about the non-windowed aggregates. If there are actual use cases
> > for this operator, we should definitely support it.
> > Since it does not depend on windows or time, we can immediately start to
> > work on it. In principle, it should be rather easy to implement.
> > However, we have to check how well it integrates with the current state
> of
> > Calcite.
> >
> > I think forking off a feature branch is a good idea. We have done that
> > before (e.g., for porting the Table API on top of Calcite), but it is not
> > so common in the Flink community.
> > So I would first send a note to the dev list and check that nobody
> objects.
> >
> > I think we can decouple the development of the Table API and SQL.
> Although
> > it is desirable to have the same feature set in both APIs, I would not be
> > strict about it.
> > However, the Table API does also depend on Calcite because all Table API
> > queries go through Calcite's logical plan representation and optimizer.
> By
> > decoupling the SQL and Table API feature development, we do not need to
> > wait for the SQL parser but still might still need certain features in
> the
> > logical plan or optimizer. I hope we can solve a lot with custom RelNodes
> > and optimizer rules which should eventually be contributed back to
> Calcite.
> >
> > Best, Fabian
> >
> >
> > 2016-06-17 9:48 GMT+02:00 Jark Wu :
> >
> >> Hi Fabian,
> >>
> >> There are a lot of our business are using non-windowed aggregations. And
> >> there is a little difference between non-windowed aggregate and Row
> window
> >> operator, as the later is bound to a certain window and emit the result
> of
> >> the N rows preceding for every incoming row. However the former emit the
> >> aggregate result of the whole elements. So I suggest to add them for
> more
> >> complete semantic.
> >>
> >> Regarding the windowed aggregate task, I’m agree with that and I'm
> looking
> >> forward as soon as possible to see the corresponding JIRA issues
> created.
> >> After that, we can start working on an independent branch without
> waiting
> >> for 1.1 released. But I’m still a little concerned about Calcite’s
> support,
> >> as we must waiting for Calcite supporting correspond syntax and the
> >> version released. If we can separate the task into Table API and SQL  ,
> we
> >> may not be blocked by Calcite too much.
> >>
> >> What do you think?
> >>
> >> - Jark Wu
> >>
> >>> 在 2016年6月16日,下午8:37,Fabian Hueske  写道:
> >>>
> >>> Hi Jark,
> >>>
> >>> thanks for sharing Blink's Streaming Table API. It seems to be close to
> >> the
> >>> DataStream API, while the Table API draft I shared is more similar to
> >>> Calcite's proposal.
> >>> You are right, the current draft does not include running
> (non-windowed)
> >>> aggregates. We were not sure how useful these are since these
> aggregates
> >>> are unbound and might become meaningless after being applied on a very
> >> long
> >>> stream. However, we can certainly add them, if users request them.
> >>> An alternative to running aggregates could be what I called "Row window
> >>> operators" in the streaming Table API draft. These operators emit an
> >>> aggregate for each incoming row, however the aggregate is bound to a
> >>> certain window around the row like the 10 rows preceding the row for
> >> which
> >>> the aggregate is computed. Calcite calls these windows "Sliding
> windows"
> >>> (Attention: This is different from Flink's terminology, in Flink
> sliding
> >>> windows are something different). Row windows are similar to running
> >>> aggregates in that they emit a row for each incoming row. You can also
> >>> think of them as a (Flink) sliding count window which is evaluate for
> >> each
> >>> incoming record.
> >>>
> >>> Further differences are the support of Scalar UDFs in the Table API and
> >> the
> >>> support for joins which have not been drafted for the Table API yet.
> >>> Scalar UDFs are definitely also on our roadmap and with upcoming
> support
> >>> for side inputs, the DataStream API will also support more types of
> >> joins.
> >>>
> >>> Regarding the current state of Stream SQL in Calcite I am not up to
> date.
> >>>
> >>> I would propose to start with the effort of adding support for windowed
> >>> aggregates as follows:
> >>>
> >>> 1) Add support to define a 

Re: How to contribute to Streaming Table API and StreamSQL

2016-06-17 Thread Jark Wu
Hi Fabian, 

Yea, we can immediately start to work on non-windowed aggregates. But it seems 
that Calcite’s StreamSQL doesn’t support non-windowed aggregates (also not 
included in roadmap). So we may need to propose this function back to Calcite 
community? 

- Jark Wu 

> 在 2016年6月17日,下午5:41,Fabian Hueske  写道:
> 
> Hi Jark Wu,
> 
> I agree about the non-windowed aggregates. If there are actual use cases
> for this operator, we should definitely support it.
> Since it does not depend on windows or time, we can immediately start to
> work on it. In principle, it should be rather easy to implement.
> However, we have to check how well it integrates with the current state of
> Calcite.
> 
> I think forking off a feature branch is a good idea. We have done that
> before (e.g., for porting the Table API on top of Calcite), but it is not
> so common in the Flink community.
> So I would first send a note to the dev list and check that nobody objects.
> 
> I think we can decouple the development of the Table API and SQL. Although
> it is desirable to have the same feature set in both APIs, I would not be
> strict about it.
> However, the Table API does also depend on Calcite because all Table API
> queries go through Calcite's logical plan representation and optimizer. By
> decoupling the SQL and Table API feature development, we do not need to
> wait for the SQL parser but still might still need certain features in the
> logical plan or optimizer. I hope we can solve a lot with custom RelNodes
> and optimizer rules which should eventually be contributed back to Calcite.
> 
> Best, Fabian
> 
> 
> 2016-06-17 9:48 GMT+02:00 Jark Wu :
> 
>> Hi Fabian,
>> 
>> There are a lot of our business are using non-windowed aggregations. And
>> there is a little difference between non-windowed aggregate and Row window
>> operator, as the later is bound to a certain window and emit the result of
>> the N rows preceding for every incoming row. However the former emit the
>> aggregate result of the whole elements. So I suggest to add them for more
>> complete semantic.
>> 
>> Regarding the windowed aggregate task, I’m agree with that and I'm looking
>> forward as soon as possible to see the corresponding JIRA issues created.
>> After that, we can start working on an independent branch without waiting
>> for 1.1 released. But I’m still a little concerned about Calcite’s support,
>> as we must waiting for Calcite supporting correspond syntax and the
>> version released. If we can separate the task into Table API and SQL  , we
>> may not be blocked by Calcite too much.
>> 
>> What do you think?
>> 
>> - Jark Wu
>> 
>>> 在 2016年6月16日,下午8:37,Fabian Hueske  写道:
>>> 
>>> Hi Jark,
>>> 
>>> thanks for sharing Blink's Streaming Table API. It seems to be close to
>> the
>>> DataStream API, while the Table API draft I shared is more similar to
>>> Calcite's proposal.
>>> You are right, the current draft does not include running (non-windowed)
>>> aggregates. We were not sure how useful these are since these aggregates
>>> are unbound and might become meaningless after being applied on a very
>> long
>>> stream. However, we can certainly add them, if users request them.
>>> An alternative to running aggregates could be what I called "Row window
>>> operators" in the streaming Table API draft. These operators emit an
>>> aggregate for each incoming row, however the aggregate is bound to a
>>> certain window around the row like the 10 rows preceding the row for
>> which
>>> the aggregate is computed. Calcite calls these windows "Sliding windows"
>>> (Attention: This is different from Flink's terminology, in Flink sliding
>>> windows are something different). Row windows are similar to running
>>> aggregates in that they emit a row for each incoming row. You can also
>>> think of them as a (Flink) sliding count window which is evaluate for
>> each
>>> incoming record.
>>> 
>>> Further differences are the support of Scalar UDFs in the Table API and
>> the
>>> support for joins which have not been drafted for the Table API yet.
>>> Scalar UDFs are definitely also on our roadmap and with upcoming support
>>> for side inputs, the DataStream API will also support more types of
>> joins.
>>> 
>>> Regarding the current state of Stream SQL in Calcite I am not up to date.
>>> 
>>> I would propose to start with the effort of adding support for windowed
>>> aggregates as follows:
>>> 
>>> 1) Add support to define a timestamp / watermark extractor to tables.
>> This
>>> includes to define a "quasi-monotone" column in a Table's schema. Calcite
>>> will use this information to reason about the validity of a query (making
>>> sure that grouping includes at least one monotone attribute).
>>> 2) Add support for sorting a stream on the timestamp attribute. While
>>> sorting itself is not very exciting, it is an easy operation and can be
>>> immediately implemented without worrying about API 

Re: How to contribute to Streaming Table API and StreamSQL

2016-06-17 Thread Fabian Hueske
Hi Jark Wu,

I agree about the non-windowed aggregates. If there are actual use cases
for this operator, we should definitely support it.
Since it does not depend on windows or time, we can immediately start to
work on it. In principle, it should be rather easy to implement.
However, we have to check how well it integrates with the current state of
Calcite.

I think forking off a feature branch is a good idea. We have done that
before (e.g., for porting the Table API on top of Calcite), but it is not
so common in the Flink community.
So I would first send a note to the dev list and check that nobody objects.

I think we can decouple the development of the Table API and SQL. Although
it is desirable to have the same feature set in both APIs, I would not be
strict about it.
However, the Table API does also depend on Calcite because all Table API
queries go through Calcite's logical plan representation and optimizer. By
decoupling the SQL and Table API feature development, we do not need to
wait for the SQL parser but still might still need certain features in the
logical plan or optimizer. I hope we can solve a lot with custom RelNodes
and optimizer rules which should eventually be contributed back to Calcite.

Best, Fabian


2016-06-17 9:48 GMT+02:00 Jark Wu :

> Hi Fabian,
>
> There are a lot of our business are using non-windowed aggregations. And
> there is a little difference between non-windowed aggregate and Row window
> operator, as the later is bound to a certain window and emit the result of
> the N rows preceding for every incoming row. However the former emit the
> aggregate result of the whole elements. So I suggest to add them for more
> complete semantic.
>
> Regarding the windowed aggregate task, I’m agree with that and I'm looking
> forward as soon as possible to see the corresponding JIRA issues created.
> After that, we can start working on an independent branch without waiting
> for 1.1 released. But I’m still a little concerned about Calcite’s support,
> as we must waiting for Calcite supporting correspond syntax and the
> version released. If we can separate the task into Table API and SQL  , we
> may not be blocked by Calcite too much.
>
> What do you think?
>
> - Jark Wu
>
> > 在 2016年6月16日,下午8:37,Fabian Hueske  写道:
> >
> > Hi Jark,
> >
> > thanks for sharing Blink's Streaming Table API. It seems to be close to
> the
> > DataStream API, while the Table API draft I shared is more similar to
> > Calcite's proposal.
> > You are right, the current draft does not include running (non-windowed)
> > aggregates. We were not sure how useful these are since these aggregates
> > are unbound and might become meaningless after being applied on a very
> long
> > stream. However, we can certainly add them, if users request them.
> > An alternative to running aggregates could be what I called "Row window
> > operators" in the streaming Table API draft. These operators emit an
> > aggregate for each incoming row, however the aggregate is bound to a
> > certain window around the row like the 10 rows preceding the row for
> which
> > the aggregate is computed. Calcite calls these windows "Sliding windows"
> > (Attention: This is different from Flink's terminology, in Flink sliding
> > windows are something different). Row windows are similar to running
> > aggregates in that they emit a row for each incoming row. You can also
> > think of them as a (Flink) sliding count window which is evaluate for
> each
> > incoming record.
> >
> > Further differences are the support of Scalar UDFs in the Table API and
> the
> > support for joins which have not been drafted for the Table API yet.
> > Scalar UDFs are definitely also on our roadmap and with upcoming support
> > for side inputs, the DataStream API will also support more types of
> joins.
> >
> > Regarding the current state of Stream SQL in Calcite I am not up to date.
> >
> > I would propose to start with the effort of adding support for windowed
> > aggregates as follows:
> >
> > 1) Add support to define a timestamp / watermark extractor to tables.
> This
> > includes to define a "quasi-monotone" column in a Table's schema. Calcite
> > will use this information to reason about the validity of a query (making
> > sure that grouping includes at least one monotone attribute).
> > 2) Add support for sorting a stream on the timestamp attribute. While
> > sorting itself is not very exciting, it is an easy operation and can be
> > immediately implemented without worrying about API questions. This will
> > also show how well Calcite supports the reasoning about monotone
> attributes.
> > 3) Add support for tumbling windows.
> >
> > In each of these steps we might need to get involved with the Calcite
> > community, depending on Calcite's current support for "quasi-monotone"
> > attributes, etc.
> >
> > What do you think?
> >
> > Best, Fabian
> >
> >
> > 2016-06-14 11:03 GMT+02:00 Jark Wu 

Re: How to contribute to Streaming Table API and StreamSQL

2016-06-17 Thread Jark Wu
Hi Fabian,

There are a lot of our business are using non-windowed aggregations. And there 
is a little difference between non-windowed aggregate and Row window operator, 
as the later is bound to a certain window and emit the result of the N rows 
preceding for every incoming row. However the former emit the aggregate result 
of the whole elements. So I suggest to add them for more complete semantic.

Regarding the windowed aggregate task, I’m agree with that and I'm looking 
forward as soon as possible to see the corresponding JIRA issues created. After 
that, we can start working on an independent branch without waiting for 1.1 
released. But I’m still a little concerned about Calcite’s support, as we must 
waiting for Calcite supporting correspond syntax and the  version released. If 
we can separate the task into Table API and SQL  , we may not be blocked by 
Calcite too much.

What do you think? 

- Jark Wu 

> 在 2016年6月16日,下午8:37,Fabian Hueske  写道:
> 
> Hi Jark,
> 
> thanks for sharing Blink's Streaming Table API. It seems to be close to the
> DataStream API, while the Table API draft I shared is more similar to
> Calcite's proposal.
> You are right, the current draft does not include running (non-windowed)
> aggregates. We were not sure how useful these are since these aggregates
> are unbound and might become meaningless after being applied on a very long
> stream. However, we can certainly add them, if users request them.
> An alternative to running aggregates could be what I called "Row window
> operators" in the streaming Table API draft. These operators emit an
> aggregate for each incoming row, however the aggregate is bound to a
> certain window around the row like the 10 rows preceding the row for which
> the aggregate is computed. Calcite calls these windows "Sliding windows"
> (Attention: This is different from Flink's terminology, in Flink sliding
> windows are something different). Row windows are similar to running
> aggregates in that they emit a row for each incoming row. You can also
> think of them as a (Flink) sliding count window which is evaluate for each
> incoming record.
> 
> Further differences are the support of Scalar UDFs in the Table API and the
> support for joins which have not been drafted for the Table API yet.
> Scalar UDFs are definitely also on our roadmap and with upcoming support
> for side inputs, the DataStream API will also support more types of joins.
> 
> Regarding the current state of Stream SQL in Calcite I am not up to date.
> 
> I would propose to start with the effort of adding support for windowed
> aggregates as follows:
> 
> 1) Add support to define a timestamp / watermark extractor to tables. This
> includes to define a "quasi-monotone" column in a Table's schema. Calcite
> will use this information to reason about the validity of a query (making
> sure that grouping includes at least one monotone attribute).
> 2) Add support for sorting a stream on the timestamp attribute. While
> sorting itself is not very exciting, it is an easy operation and can be
> immediately implemented without worrying about API questions. This will
> also show how well Calcite supports the reasoning about monotone attributes.
> 3) Add support for tumbling windows.
> 
> In each of these steps we might need to get involved with the Calcite
> community, depending on Calcite's current support for "quasi-monotone"
> attributes, etc.
> 
> What do you think?
> 
> Best, Fabian
> 
> 
> 2016-06-14 11:03 GMT+02:00 Jark Wu :
> 
>> Hi Fabian,
>> 
>> It’s great to hear that we are going to start it!
>> 
>> I’m glad to share our current Streaming Table API [1]. I find that that
>> all aggregation functions are scoped to the defined window in Flink Stream
>> Table API design [2] and Calcite StreamSQL desgin [3]. I’m thinking that do
>> we need global aggregation? The global aggregation means that aggregation
>> is applied only on grouped key not including window which is supported in
>> DataStream `datastream.keyBy(f1).sum(f2)`.
>> 
>> Since the window syntax of StreamSQL is not implemented yet, so will we
>> help Calcite community with that first or work code for window+agg Table
>> API first ?
>> 
>> 
>> [1]
>> https://docs.google.com/document/d/1KMUzvBAWSyQ39T8MyxUi0zNHyvLUnyGMPA7_RLSDpFw/edit?usp=sharing
>> <
>> https://docs.google.com/document/d/1KMUzvBAWSyQ39T8MyxUi0zNHyvLUnyGMPA7_RLSDpFw/edit?usp=sharing
>>> 
>> [2]
>> https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E/edit#
>> <
>> https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E/edit#
>>> 
>> [3] https://calcite.apache.org/docs/stream.html#tumbling-windows <
>> https://calcite.apache.org/docs/stream.html#tumbling-windows>
>> 
>> 
>> - Jark Wu
>> 
>>> 在 2016年6月14日,上午1:10,Fabian Hueske  写道:
>>> 
>>> Hi Jark,
>>> 
>>> wow, that's good news!
>>> You are right, the streaming Table API is currently very 

Re: How to contribute to Streaming Table API and StreamSQL

2016-06-16 Thread Fabian Hueske
Hi Jark,

thanks for sharing Blink's Streaming Table API. It seems to be close to the
DataStream API, while the Table API draft I shared is more similar to
Calcite's proposal.
You are right, the current draft does not include running (non-windowed)
aggregates. We were not sure how useful these are since these aggregates
are unbound and might become meaningless after being applied on a very long
stream. However, we can certainly add them, if users request them.
An alternative to running aggregates could be what I called "Row window
operators" in the streaming Table API draft. These operators emit an
aggregate for each incoming row, however the aggregate is bound to a
certain window around the row like the 10 rows preceding the row for which
the aggregate is computed. Calcite calls these windows "Sliding windows"
(Attention: This is different from Flink's terminology, in Flink sliding
windows are something different). Row windows are similar to running
aggregates in that they emit a row for each incoming row. You can also
think of them as a (Flink) sliding count window which is evaluate for each
incoming record.

Further differences are the support of Scalar UDFs in the Table API and the
support for joins which have not been drafted for the Table API yet.
Scalar UDFs are definitely also on our roadmap and with upcoming support
for side inputs, the DataStream API will also support more types of joins.

Regarding the current state of Stream SQL in Calcite I am not up to date.

I would propose to start with the effort of adding support for windowed
aggregates as follows:

1) Add support to define a timestamp / watermark extractor to tables. This
includes to define a "quasi-monotone" column in a Table's schema. Calcite
will use this information to reason about the validity of a query (making
sure that grouping includes at least one monotone attribute).
2) Add support for sorting a stream on the timestamp attribute. While
sorting itself is not very exciting, it is an easy operation and can be
immediately implemented without worrying about API questions. This will
also show how well Calcite supports the reasoning about monotone attributes.
3) Add support for tumbling windows.

In each of these steps we might need to get involved with the Calcite
community, depending on Calcite's current support for "quasi-monotone"
attributes, etc.

What do you think?

Best, Fabian


2016-06-14 11:03 GMT+02:00 Jark Wu :

> Hi Fabian,
>
> It’s great to hear that we are going to start it!
>
> I’m glad to share our current Streaming Table API [1]. I find that that
> all aggregation functions are scoped to the defined window in Flink Stream
> Table API design [2] and Calcite StreamSQL desgin [3]. I’m thinking that do
> we need global aggregation? The global aggregation means that aggregation
> is applied only on grouped key not including window which is supported in
> DataStream `datastream.keyBy(f1).sum(f2)`.
>
> Since the window syntax of StreamSQL is not implemented yet, so will we
> help Calcite community with that first or work code for window+agg Table
> API first ?
>
>
> [1]
> https://docs.google.com/document/d/1KMUzvBAWSyQ39T8MyxUi0zNHyvLUnyGMPA7_RLSDpFw/edit?usp=sharing
> <
> https://docs.google.com/document/d/1KMUzvBAWSyQ39T8MyxUi0zNHyvLUnyGMPA7_RLSDpFw/edit?usp=sharing
> >
> [2]
> https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E/edit#
> <
> https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E/edit#
> >
> [3] https://calcite.apache.org/docs/stream.html#tumbling-windows <
> https://calcite.apache.org/docs/stream.html#tumbling-windows>
>
>
> - Jark Wu
>
> > 在 2016年6月14日,上午1:10,Fabian Hueske  写道:
> >
> > Hi Jark,
> >
> > wow, that's good news!
> > You are right, the streaming Table API is currently very limited. In the
> > last month's we changed the internal architecture and put everything on
> top
> > of Apache Calcite.
> > For the upcoming 1.1 release, we won't add new features to the Table API
> /
> > SQL. However for the 1.2 release, it we plan to focus on the streaming
> > Table API and Stream SQL to add support for windowed aggregates and
> joins,
> > which corresponds to Task 7 and 9 in the design document. You are
> > completely right, that we should start to add tickets to JIRA for this. I
> > will do that tomorrow.
> >
> > It is great that you have already working code for windowed aggregates
> and
> > joins! Here is a link to our current API draft for windows in the Table
> API
> > [1]. Would be great if you could share how your API looks like. As you
> > said, the internals have changed a lot by now, but we might want to reuse
> > your API for Table API windows and maybe the code of the runtime.
> However,
> > we need to go through Calcite for optimization and SQL support, so some
> > parts need to be definitely changed. Stream SQL is also on the roadmap of
> > the Calcite community, but it might be that some 

Re: How to contribute to Streaming Table API and StreamSQL

2016-06-14 Thread Jark Wu
Hi Fabian, 

It’s great to hear that we are going to start it! 

I’m glad to share our current Streaming Table API [1]. I find that that all 
aggregation functions are scoped to the defined window in Flink Stream Table 
API design [2] and Calcite StreamSQL desgin [3]. I’m thinking that do we need 
global aggregation? The global aggregation means that aggregation is applied 
only on grouped key not including window which is supported in DataStream 
`datastream.keyBy(f1).sum(f2)`.  

Since the window syntax of StreamSQL is not implemented yet, so will we help 
Calcite community with that first or work code for window+agg Table API first ? 


[1] 
https://docs.google.com/document/d/1KMUzvBAWSyQ39T8MyxUi0zNHyvLUnyGMPA7_RLSDpFw/edit?usp=sharing
 

[2] 
https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E/edit#
 

[3] https://calcite.apache.org/docs/stream.html#tumbling-windows 



- Jark Wu 

> 在 2016年6月14日,上午1:10,Fabian Hueske  写道:
> 
> Hi Jark,
> 
> wow, that's good news!
> You are right, the streaming Table API is currently very limited. In the
> last month's we changed the internal architecture and put everything on top
> of Apache Calcite.
> For the upcoming 1.1 release, we won't add new features to the Table API /
> SQL. However for the 1.2 release, it we plan to focus on the streaming
> Table API and Stream SQL to add support for windowed aggregates and joins,
> which corresponds to Task 7 and 9 in the design document. You are
> completely right, that we should start to add tickets to JIRA for this. I
> will do that tomorrow.
> 
> It is great that you have already working code for windowed aggregates and
> joins! Here is a link to our current API draft for windows in the Table API
> [1]. Would be great if you could share how your API looks like. As you
> said, the internals have changed a lot by now, but we might want to reuse
> your API for Table API windows and maybe the code of the runtime. However,
> we need to go through Calcite for optimization and SQL support, so some
> parts need to be definitely changed. Stream SQL is also on the roadmap of
> the Calcite community, but it might be that some features that we will need
> are not completed yet. So, maybe we help the Calcite community with that as
> well.
> 
> If you want to contribute, you should first read the how to contribute
> guide [2] and guide for code contributions [3].
> The general rule is to first open a JIRA and later a pull request. Changes
> that are extensive or modify current behavior (except bugs) should be
> discussed before starting to work on them.
> 
> Looking forward to work with you on Flink,
> Fabian
> 
> [1]
> https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E/edit#heading=h.3iw7frfjdcb2
> [2] http://flink.apache.org/how-to-contribute.html
> [3] http://flink.apache.org/contribute-code.html
> 
> 
> 2016-06-13 11:31 GMT+02:00 Jark Wu :
> 
>> Hi,
>> 
>> We have a great interest in the new Table API & SQL. In Alibaba, we have
>> added a lot of features (groupBy, agg, window, join, UDF …) to Streaming
>> Table API (base on Flink 1.0). Now, many jobs run on Table API in
>> production environment. But we want to keep pace with the community, and we
>> have noticed that Flink Community reworked the Table API and also supported
>> SQL. That is really cool. However, the Streaming Table API is still so
>> weak. So we want to contribute to accelerate the Streaming Table API and
>> StreamSQL growth.
>> 
>> It seems that we have complete Task-5 and Task-6 mentioned in the Work
>> Plan <
>> https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI/edit#>.
>> So can we start Task-7 and Task-9 now? Is there any more specific plans? I
>> think it’s better to create an umbrella JIRA like FLINK-3221 to make the
>> develop plan clearer.
>> 
>> If I want to contribute code for groupBy and agg function, what should I
>> do? As I didn’t find related JIRAs, can I create JIRA and pull a request
>> directly?
>> 
>> Sorry for so many questions at a time.
>> 
>> 
>> 
>> - Jark Wu (wuchong)
>> 
>> 



Re: How to contribute to Streaming Table API and StreamSQL

2016-06-13 Thread Fabian Hueske
Hi Jark,

wow, that's good news!
You are right, the streaming Table API is currently very limited. In the
last month's we changed the internal architecture and put everything on top
of Apache Calcite.
For the upcoming 1.1 release, we won't add new features to the Table API /
SQL. However for the 1.2 release, it we plan to focus on the streaming
Table API and Stream SQL to add support for windowed aggregates and joins,
which corresponds to Task 7 and 9 in the design document. You are
completely right, that we should start to add tickets to JIRA for this. I
will do that tomorrow.

It is great that you have already working code for windowed aggregates and
joins! Here is a link to our current API draft for windows in the Table API
[1]. Would be great if you could share how your API looks like. As you
said, the internals have changed a lot by now, but we might want to reuse
your API for Table API windows and maybe the code of the runtime. However,
we need to go through Calcite for optimization and SQL support, so some
parts need to be definitely changed. Stream SQL is also on the roadmap of
the Calcite community, but it might be that some features that we will need
are not completed yet. So, maybe we help the Calcite community with that as
well.

If you want to contribute, you should first read the how to contribute
guide [2] and guide for code contributions [3].
The general rule is to first open a JIRA and later a pull request. Changes
that are extensive or modify current behavior (except bugs) should be
discussed before starting to work on them.

Looking forward to work with you on Flink,
Fabian

[1]
https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o3AyCh2ePqr3V5E/edit#heading=h.3iw7frfjdcb2
[2] http://flink.apache.org/how-to-contribute.html
[3] http://flink.apache.org/contribute-code.html


2016-06-13 11:31 GMT+02:00 Jark Wu :

> Hi,
>
> We have a great interest in the new Table API & SQL. In Alibaba, we have
> added a lot of features (groupBy, agg, window, join, UDF …) to Streaming
> Table API (base on Flink 1.0). Now, many jobs run on Table API in
> production environment. But we want to keep pace with the community, and we
> have noticed that Flink Community reworked the Table API and also supported
> SQL. That is really cool. However, the Streaming Table API is still so
> weak. So we want to contribute to accelerate the Streaming Table API and
> StreamSQL growth.
>
> It seems that we have complete Task-5 and Task-6 mentioned in the Work
> Plan <
> https://docs.google.com/document/d/1TLayJNOTBle_-m1rQfgA6Ouj1oYsfqRjPcp1h2TVqdI/edit#>.
> So can we start Task-7 and Task-9 now? Is there any more specific plans? I
> think it’s better to create an umbrella JIRA like FLINK-3221 to make the
> develop plan clearer.
>
> If I want to contribute code for groupBy and agg function, what should I
> do? As I didn’t find related JIRAs, can I create JIRA and pull a request
> directly?
>
> Sorry for so many questions at a time.
>
>
>
> - Jark Wu (wuchong)
>
>