Re: SQL on Flink

2015-05-27 Thread Ufuk Celebi

On 27 May 2015, at 17:05, Timo Walther twal...@apache.org wrote:

 It's rather passion for the future of the project than passion for SQL ;-)
 
 I always try to think like someone from the economy. And IMO the guys from 
 economy are still thinking in SQL. If you want to persuade someone coming 
 from the SQL world, you should offer a SQL interface to run legacy code first 
 (similar to Hadoop operators). Rewriting old queries in Table API is not very 
 convenient.
 
 I share Stephans opinion. Building both APIs concurrently would act as a good 
 source to test and extend the Table API. Currently, the Table API is 
 half-done, but I think the goal is to have SQL functionality. I can implement 
 an SQL operator and extend the Table API if functionality is missing.

Very exiting! :-) +1

As suggested, I think the best thing is to do this hand-in-hand with the Table 
API. I don't think that there was any real disagreement. Everyone agrees that 
the SQL layer should be built on top of the Table API, which is great for both 
the Table API and the SQL layer. :-)



Re: SQL on Flink

2015-05-27 Thread Kostas Tzoumas
very excited to see this starting!

On Wed, May 27, 2015 at 6:06 PM, Ufuk Celebi u...@apache.org wrote:


 On 27 May 2015, at 17:05, Timo Walther twal...@apache.org wrote:

  It's rather passion for the future of the project than passion for SQL
 ;-)
 
  I always try to think like someone from the economy. And IMO the guys
 from economy are still thinking in SQL. If you want to persuade someone
 coming from the SQL world, you should offer a SQL interface to run legacy
 code first (similar to Hadoop operators). Rewriting old queries in Table
 API is not very convenient.
 
  I share Stephans opinion. Building both APIs concurrently would act as a
 good source to test and extend the Table API. Currently, the Table API is
 half-done, but I think the goal is to have SQL functionality. I can
 implement an SQL operator and extend the Table API if functionality is
 missing.

 Very exiting! :-) +1

 As suggested, I think the best thing is to do this hand-in-hand with the
 Table API. I don't think that there was any real disagreement. Everyone
 agrees that the SQL layer should be built on top of the Table API, which is
 great for both the Table API and the SQL layer. :-)




Re: SQL on Flink

2015-05-27 Thread Fabian Hueske
IMO, it is better to have one feature that is reasonably well developed
instead of two half-baked features. That's why I proposed to advance the
Table API a bit further before starting the next big thing. I played around
with the Table API recently and I think it definitely needs a bit more
contributor attention and more features to be actually usable. Also since
all features of the SQL interface need to be included in the Table API
(given we follow the SQL on Table approach) it makes sense IMO to push the
Table API a bit further before going for the next thing.

2015-05-27 16:06 GMT+02:00 Stephan Ewen se...@apache.org:

 I see no reason why a SQL interface cannot be bootstrapped concurrently.
 It would initially not support many operations,
 but would act as a good source to test and drive functionality from the
 Table API.


 @Ted:

 I would like to learn a bit more about the stack and internal abstractions
 of Drill. It may make sense to
 reuse some of the query execution operators from Drill. I especially like
 the learning schema on the fly part of drill.

 Flink DataSets and Streams have a schema, but it may in several cases be a
 schema lower bound, like the greatest common superclass.
 Those cases may benefit big time from Drill's ability to refine schema on
 the fly.

 That may be useful also in the Table API, making it again available to
 LINQ-like programs, and SQL scripts.

 On Wed, May 27, 2015 at 3:49 PM, Robert Metzger rmetz...@apache.org
 wrote:

  I didn't know that paper...  Thanks for sharing.
 
  I've worked on a SQL layer for Stratosphere some time ago, using Apache
  Calcite (called Optiq back then). I think the project provides a lot of
  very good tooling for creating a SQL layer. So if we decide to go for SQL
  on Flink, I would suggest to use Calcite.
  I can also help you a bit with Calcite to get started with it.
 
  I agree with Fabian that it would probably make more sense for now to
  enhance the Table API.
  I think the biggest limitation right now is that it only supports POJOs.
  We should also support Tuples (I know thats difficult to do), data from
  HCatalog (that includes parquet  orc), JSON, ...
  Then, I would add filter and projection pushdown into the table API.
 
 
 
  On Tue, May 26, 2015 at 10:03 PM, Ted Dunning ted.dunn...@gmail.com
  wrote:
 
   It would also be relatively simple (I think) to retarget drill to Flink
  if
   Flink doesn't provide enough typing meta-data to do traditional SQL.
  
  
  
   On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske fhue...@gmail.com
  wrote:
  
Hi,
   
Flink's Table API is pretty close to what SQL provides. IMO, the best
approach would be to leverage that and build a SQL parser (maybe
  together
with a logical optimizer) on top of the Table API. Parser (and
  optimizer)
could be built using Apache Calcite which is providing exactly this.
   
Since the Table API is still a fairly new component and not very
  feature
rich, it might make sense to extend and strengthen it before putting
something major on top.
   
Cheers, Fabian
   
2015-05-26 21:38 GMT+02:00 Timo Walther twal...@apache.org:
   
 Hey everyone,

 I would be interested in having a complete SQL API in Flink. How is
  the
 status there? Is someone already working on it? If not, I would
 like
  to
 work on it. I found
 http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf
   but
 I couldn't find anything on the mailing list or Jira. Otherwise I
  would
 open an issue and start a discussion about it there.

 Regards,
 Timo

   
  
 



Re: SQL on Flink

2015-05-27 Thread Timo Walther

It's rather passion for the future of the project than passion for SQL ;-)

I always try to think like someone from the economy. And IMO the guys 
from economy are still thinking in SQL. If you want to persuade someone 
coming from the SQL world, you should offer a SQL interface to run 
legacy code first (similar to Hadoop operators). Rewriting old queries 
in Table API is not very convenient.


I share Stephans opinion. Building both APIs concurrently would act as a 
good source to test and extend the Table API. Currently, the Table API 
is half-done, but I think the goal is to have SQL functionality. I can 
implement an SQL operator and extend the Table API if functionality is 
missing.


On 27.05.2015 16:41, Fabian Hueske wrote:

+1 for committer passion!

Please don't get me wrong, I think SQL on Flink would be a great feature.
I just wanted to make the point that the Table API needs to mirror all SQL
features, if SQL is implemented on top of the Table API.


2015-05-27 16:34 GMT+02:00 Kostas Tzoumas ktzou...@apache.org:


I think Fabian's arguments make a lot of sense.

However, if Timo *really wants* to start SQL on top of Table, that is what
he will do a great job at :-) As usual, we can keep it in beta status in
flink-staging until it is mature... and it will help create issues for the
Table API and give direction to its development. Perhaps we will have a
feature-poor SQL for a bit, then switch to hardening the Table API to
support more features and then back to SQL.

I'm just advocating for committer passion-first here :-) Perhaps Timo
should weight in

On Wed, May 27, 2015 at 4:19 PM, Fabian Hueske fhue...@gmail.com wrote:


IMO, it is better to have one feature that is reasonably well developed
instead of two half-baked features. That's why I proposed to advance the
Table API a bit further before starting the next big thing. I played

around

with the Table API recently and I think it definitely needs a bit more
contributor attention and more features to be actually usable. Also since
all features of the SQL interface need to be included in the Table API
(given we follow the SQL on Table approach) it makes sense IMO to push

the

Table API a bit further before going for the next thing.

2015-05-27 16:06 GMT+02:00 Stephan Ewen se...@apache.org:


I see no reason why a SQL interface cannot be bootstrapped

concurrently.

It would initially not support many operations,
but would act as a good source to test and drive functionality from the
Table API.


@Ted:

I would like to learn a bit more about the stack and internal

abstractions

of Drill. It may make sense to
reuse some of the query execution operators from Drill. I especially

like

the learning schema on the fly part of drill.

Flink DataSets and Streams have a schema, but it may in several cases

be

a

schema lower bound, like the greatest common superclass.
Those cases may benefit big time from Drill's ability to refine schema

on

the fly.

That may be useful also in the Table API, making it again available to
LINQ-like programs, and SQL scripts.

On Wed, May 27, 2015 at 3:49 PM, Robert Metzger rmetz...@apache.org
wrote:


I didn't know that paper...  Thanks for sharing.

I've worked on a SQL layer for Stratosphere some time ago, using

Apache

Calcite (called Optiq back then). I think the project provides a lot

of

very good tooling for creating a SQL layer. So if we decide to go for

SQL

on Flink, I would suggest to use Calcite.
I can also help you a bit with Calcite to get started with it.

I agree with Fabian that it would probably make more sense for now to
enhance the Table API.
I think the biggest limitation right now is that it only supports

POJOs.

We should also support Tuples (I know thats difficult to do), data

from

HCatalog (that includes parquet  orc), JSON, ...
Then, I would add filter and projection pushdown into the table API.



On Tue, May 26, 2015 at 10:03 PM, Ted Dunning ted.dunn...@gmail.com
wrote:


It would also be relatively simple (I think) to retarget drill to

Flink

if

Flink doesn't provide enough typing meta-data to do traditional

SQL.



On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske fhue...@gmail.com

wrote:

Hi,

Flink's Table API is pretty close to what SQL provides. IMO, the

best

approach would be to leverage that and build a SQL parser (maybe

together

with a logical optimizer) on top of the Table API. Parser (and

optimizer)

could be built using Apache Calcite which is providing exactly

this.

Since the Table API is still a fairly new component and not very

feature

rich, it might make sense to extend and strengthen it before

putting

something major on top.

Cheers, Fabian

2015-05-26 21:38 GMT+02:00 Timo Walther twal...@apache.org:


Hey everyone,

I would be interested in having a complete SQL API in Flink.

How

is

the

status there? Is someone already working on it? If not, I would

like

to

work on it. I found

http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf

but

I 

Re: SQL on Flink

2015-05-27 Thread Fabian Hueske
+1 for committer passion!

Please don't get me wrong, I think SQL on Flink would be a great feature.
I just wanted to make the point that the Table API needs to mirror all SQL
features, if SQL is implemented on top of the Table API.


2015-05-27 16:34 GMT+02:00 Kostas Tzoumas ktzou...@apache.org:

 I think Fabian's arguments make a lot of sense.

 However, if Timo *really wants* to start SQL on top of Table, that is what
 he will do a great job at :-) As usual, we can keep it in beta status in
 flink-staging until it is mature... and it will help create issues for the
 Table API and give direction to its development. Perhaps we will have a
 feature-poor SQL for a bit, then switch to hardening the Table API to
 support more features and then back to SQL.

 I'm just advocating for committer passion-first here :-) Perhaps Timo
 should weight in

 On Wed, May 27, 2015 at 4:19 PM, Fabian Hueske fhue...@gmail.com wrote:

  IMO, it is better to have one feature that is reasonably well developed
  instead of two half-baked features. That's why I proposed to advance the
  Table API a bit further before starting the next big thing. I played
 around
  with the Table API recently and I think it definitely needs a bit more
  contributor attention and more features to be actually usable. Also since
  all features of the SQL interface need to be included in the Table API
  (given we follow the SQL on Table approach) it makes sense IMO to push
 the
  Table API a bit further before going for the next thing.
 
  2015-05-27 16:06 GMT+02:00 Stephan Ewen se...@apache.org:
 
   I see no reason why a SQL interface cannot be bootstrapped
  concurrently.
   It would initially not support many operations,
   but would act as a good source to test and drive functionality from the
   Table API.
  
  
   @Ted:
  
   I would like to learn a bit more about the stack and internal
  abstractions
   of Drill. It may make sense to
   reuse some of the query execution operators from Drill. I especially
 like
   the learning schema on the fly part of drill.
  
   Flink DataSets and Streams have a schema, but it may in several cases
 be
  a
   schema lower bound, like the greatest common superclass.
   Those cases may benefit big time from Drill's ability to refine schema
 on
   the fly.
  
   That may be useful also in the Table API, making it again available to
   LINQ-like programs, and SQL scripts.
  
   On Wed, May 27, 2015 at 3:49 PM, Robert Metzger rmetz...@apache.org
   wrote:
  
I didn't know that paper...  Thanks for sharing.
   
I've worked on a SQL layer for Stratosphere some time ago, using
 Apache
Calcite (called Optiq back then). I think the project provides a lot
 of
very good tooling for creating a SQL layer. So if we decide to go for
  SQL
on Flink, I would suggest to use Calcite.
I can also help you a bit with Calcite to get started with it.
   
I agree with Fabian that it would probably make more sense for now to
enhance the Table API.
I think the biggest limitation right now is that it only supports
  POJOs.
We should also support Tuples (I know thats difficult to do), data
 from
HCatalog (that includes parquet  orc), JSON, ...
Then, I would add filter and projection pushdown into the table API.
   
   
   
On Tue, May 26, 2015 at 10:03 PM, Ted Dunning ted.dunn...@gmail.com
 
wrote:
   
 It would also be relatively simple (I think) to retarget drill to
  Flink
if
 Flink doesn't provide enough typing meta-data to do traditional
 SQL.



 On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske fhue...@gmail.com
 
wrote:

  Hi,
 
  Flink's Table API is pretty close to what SQL provides. IMO, the
  best
  approach would be to leverage that and build a SQL parser (maybe
together
  with a logical optimizer) on top of the Table API. Parser (and
optimizer)
  could be built using Apache Calcite which is providing exactly
  this.
 
  Since the Table API is still a fairly new component and not very
feature
  rich, it might make sense to extend and strengthen it before
  putting
  something major on top.
 
  Cheers, Fabian
 
  2015-05-26 21:38 GMT+02:00 Timo Walther twal...@apache.org:
 
   Hey everyone,
  
   I would be interested in having a complete SQL API in Flink.
 How
  is
the
   status there? Is someone already working on it? If not, I would
   like
to
   work on it. I found
   http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf
 but
   I couldn't find anything on the mailing list or Jira.
 Otherwise I
would
   open an issue and start a discussion about it there.
  
   Regards,
   Timo
  
 

   
  
 



Re: SQL on Flink

2015-05-27 Thread Kostas Tzoumas
I think Fabian's arguments make a lot of sense.

However, if Timo *really wants* to start SQL on top of Table, that is what
he will do a great job at :-) As usual, we can keep it in beta status in
flink-staging until it is mature... and it will help create issues for the
Table API and give direction to its development. Perhaps we will have a
feature-poor SQL for a bit, then switch to hardening the Table API to
support more features and then back to SQL.

I'm just advocating for committer passion-first here :-) Perhaps Timo
should weight in

On Wed, May 27, 2015 at 4:19 PM, Fabian Hueske fhue...@gmail.com wrote:

 IMO, it is better to have one feature that is reasonably well developed
 instead of two half-baked features. That's why I proposed to advance the
 Table API a bit further before starting the next big thing. I played around
 with the Table API recently and I think it definitely needs a bit more
 contributor attention and more features to be actually usable. Also since
 all features of the SQL interface need to be included in the Table API
 (given we follow the SQL on Table approach) it makes sense IMO to push the
 Table API a bit further before going for the next thing.

 2015-05-27 16:06 GMT+02:00 Stephan Ewen se...@apache.org:

  I see no reason why a SQL interface cannot be bootstrapped
 concurrently.
  It would initially not support many operations,
  but would act as a good source to test and drive functionality from the
  Table API.
 
 
  @Ted:
 
  I would like to learn a bit more about the stack and internal
 abstractions
  of Drill. It may make sense to
  reuse some of the query execution operators from Drill. I especially like
  the learning schema on the fly part of drill.
 
  Flink DataSets and Streams have a schema, but it may in several cases be
 a
  schema lower bound, like the greatest common superclass.
  Those cases may benefit big time from Drill's ability to refine schema on
  the fly.
 
  That may be useful also in the Table API, making it again available to
  LINQ-like programs, and SQL scripts.
 
  On Wed, May 27, 2015 at 3:49 PM, Robert Metzger rmetz...@apache.org
  wrote:
 
   I didn't know that paper...  Thanks for sharing.
  
   I've worked on a SQL layer for Stratosphere some time ago, using Apache
   Calcite (called Optiq back then). I think the project provides a lot of
   very good tooling for creating a SQL layer. So if we decide to go for
 SQL
   on Flink, I would suggest to use Calcite.
   I can also help you a bit with Calcite to get started with it.
  
   I agree with Fabian that it would probably make more sense for now to
   enhance the Table API.
   I think the biggest limitation right now is that it only supports
 POJOs.
   We should also support Tuples (I know thats difficult to do), data from
   HCatalog (that includes parquet  orc), JSON, ...
   Then, I would add filter and projection pushdown into the table API.
  
  
  
   On Tue, May 26, 2015 at 10:03 PM, Ted Dunning ted.dunn...@gmail.com
   wrote:
  
It would also be relatively simple (I think) to retarget drill to
 Flink
   if
Flink doesn't provide enough typing meta-data to do traditional SQL.
   
   
   
On Tue, May 26, 2015 at 12:52 PM, Fabian Hueske fhue...@gmail.com
   wrote:
   
 Hi,

 Flink's Table API is pretty close to what SQL provides. IMO, the
 best
 approach would be to leverage that and build a SQL parser (maybe
   together
 with a logical optimizer) on top of the Table API. Parser (and
   optimizer)
 could be built using Apache Calcite which is providing exactly
 this.

 Since the Table API is still a fairly new component and not very
   feature
 rich, it might make sense to extend and strengthen it before
 putting
 something major on top.

 Cheers, Fabian

 2015-05-26 21:38 GMT+02:00 Timo Walther twal...@apache.org:

  Hey everyone,
 
  I would be interested in having a complete SQL API in Flink. How
 is
   the
  status there? Is someone already working on it? If not, I would
  like
   to
  work on it. I found
  http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf
but
  I couldn't find anything on the mailing list or Jira. Otherwise I
   would
  open an issue and start a discussion about it there.
 
  Regards,
  Timo
 

   
  
 



Re: SQL on Flink

2015-05-27 Thread Aljoscha Krettek
+1 to what ufuk said. :D
On May 27, 2015 6:13 PM, Kostas Tzoumas ktzou...@apache.org wrote:

 very excited to see this starting!

 On Wed, May 27, 2015 at 6:06 PM, Ufuk Celebi u...@apache.org wrote:

 
  On 27 May 2015, at 17:05, Timo Walther twal...@apache.org wrote:
 
   It's rather passion for the future of the project than passion for SQL
  ;-)
  
   I always try to think like someone from the economy. And IMO the guys
  from economy are still thinking in SQL. If you want to persuade someone
  coming from the SQL world, you should offer a SQL interface to run legacy
  code first (similar to Hadoop operators). Rewriting old queries in Table
  API is not very convenient.
  
   I share Stephans opinion. Building both APIs concurrently would act as
 a
  good source to test and extend the Table API. Currently, the Table API is
  half-done, but I think the goal is to have SQL functionality. I can
  implement an SQL operator and extend the Table API if functionality is
  missing.
 
  Very exiting! :-) +1
 
  As suggested, I think the best thing is to do this hand-in-hand with the
  Table API. I don't think that there was any real disagreement. Everyone
  agrees that the SQL layer should be built on top of the Table API, which
 is
  great for both the Table API and the SQL layer. :-)
 
 



Re: SQL on Flink

2015-05-26 Thread Fabian Hueske
Hi,

Flink's Table API is pretty close to what SQL provides. IMO, the best
approach would be to leverage that and build a SQL parser (maybe together
with a logical optimizer) on top of the Table API. Parser (and optimizer)
could be built using Apache Calcite which is providing exactly this.

Since the Table API is still a fairly new component and not very feature
rich, it might make sense to extend and strengthen it before putting
something major on top.

Cheers, Fabian

2015-05-26 21:38 GMT+02:00 Timo Walther twal...@apache.org:

 Hey everyone,

 I would be interested in having a complete SQL API in Flink. How is the
 status there? Is someone already working on it? If not, I would like to
 work on it. I found http://ijcsi.org/papers/IJCSI-12-1-1-169-174.pdf but
 I couldn't find anything on the mailing list or Jira. Otherwise I would
 open an issue and start a discussion about it there.

 Regards,
 Timo