Re: Support SqlStreaming in spark

2019-06-03 Thread Stavros Kontopoulos
Hi all,
>From what I read there is an effort here to globally standardize SQL
Streaming (Flink people, Google at others are working with SQL
standardization body) https://arxiv.org/abs/1905.12133v1
should
Spark community be part of it?

Best,
Stavros

On Thu, Mar 28, 2019 at 12:03 PM uncleGen  wrote:

> Hi all,
>
> I have rewritten the design doc based on previous discussing.
>
> https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0
>
> Would be interested to hear what others think.
>
> Regards,
> Genmao Yu
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Support SqlStreaming in spark

2019-03-28 Thread uncleGen
Hi all, 

I have rewritten the design doc based on previous discussing. 
https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0

Would be interested to hear what others think.

Regards,
Genmao Yu 



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Support SqlStreaming in spark

2019-03-28 Thread uncleGen
Hi all, 

I have rewritten the design doc based on previous discussing. 
https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0

Would be interested to hear what others think. 

Regards, 
Genmao Yu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Support SqlStreaming in spark

2019-02-10 Thread sujith chacko
Hi All,

 I think there are few more updates are added in the design document
compare to last document where few folks has reviewed and provided inputs.,
requesting all experts to review the design document and help us to
baseline the design for the  SPIP
'Support SQL streaming' in spark structured streaming, few more sections is
been added in-order to handle some scenarios as below

1) Passing the stream level configurations to the sql command instead of
setting it in session/application level.

2) Supporting Multiple Streams in single application,. etc

Link to the design document

https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#


Few Questions are already clarified by Jacky, please find through below link

https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#heading=h.t96f9l205fk1


Regards,
Sujith

On Thu, Dec 27, 2018 at 6:39 PM JackyLee  wrote:

> Hi, Wenchen
>
> Thank you for your recognition of Streaming on sql. I have written the
> SQLStreaming design document:
>
> https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#
>
> Your Questions are answered in here:
>
> https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#heading=h.t96f9l205fk1
>
> There may be some details that I have not considered, we can discuss it in
> more depth.
>
> Thanks
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Support SqlStreaming in spark

2018-12-27 Thread JackyLee
Hi, Wenchen

Thank you for your recognition of Streaming on sql. I have written the
SQLStreaming design document:
https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#

Your Questions are answered in here:
https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0/edit#heading=h.t96f9l205fk1

There may be some details that I have not considered, we can discuss it in
more depth.

Thanks



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Support SqlStreaming in spark

2018-12-25 Thread JackyLee
No problem



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Support SqlStreaming in spark

2018-12-24 Thread Wenchen Fan
Hi JackyLee,

Can you put the answers to these questions in the design doc?

e.g. if we don't want to support manipulating a streaming query, then is
`SELECT STREAM ...` a blocking action? And how users can create a Spark
application with multiple streaming jobs? How users can run Structured
Streaming interactively? etc.

On Sat, Dec 22, 2018 at 3:04 PM JackyLee  wrote:

> Hi wenchen
> I have been working at SQLStreaming for a year, and I have promoted it
> in company.
> I have seen the design for Kafka or the Calcite, and I believe my
> design
> is better than them. They support pure-SQL not table API for streaming.
> Users can only use the specified Streaming statement, and the same
> statement
> can't run Batch queries.
> But in my opinion, the Table API is actually  the key to solve
> SQLStreaming, pure-SQL is just another expression of the Streaming Table
> API.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Support SqlStreaming in spark

2018-12-21 Thread JackyLee
Hi wenchen
I have been working at SQLStreaming for a year, and I have promoted it
in company. 
I have seen the design for Kafka or the Calcite, and I believe my design
is better than them. They support pure-SQL not table API for streaming.
Users can only use the specified Streaming statement, and the same statement
can't run Batch queries.
But in my opinion, the Table API is actually  the key to solve
SQLStreaming, pure-SQL is just another expression of the Streaming Table
API.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Support SqlStreaming in spark

2018-12-21 Thread JackyLee
Hi wenchen and Arun Mahadevan
Thanks for your reply.

SQLStreaming is not just a way to support pure-SQL, but also a way to
define table api for Streaming.
I have redefined the SQLStreaming to make it support table API. User can
use sql or table API to run SQLStreaming. 

I will update the design document of SQLStreaming. Could you help me
improve the design doc?

Again, thanks for your attention.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Support SqlStreaming in spark

2018-12-21 Thread Arun Mahadevan
There has been efforts to come up with a unified syntax for streaming (see
[1] [2]), but I guess there will be differences based on the streaming
features supported by a system.

Agree it needs a detailed design and it can be as close to the Spark batch
SQL syntax as possible.

Also I am not sure if its possible or makes sense to express all the
operations via pure sql. e.g. the query start/stop, triggers, watermark etc
might be better expressed via APIs.

[1]
https://docs.google.com/document/d/1wrla8mF_mmq-NW9sdJHYVgMyZsgCmHumJJ5f5WUzTiM/edit#heading=h.vfrf26d6b3ne
[2] https://calcite.apache.org/docs/stream.html


On Fri, 21 Dec 2018 at 18:13, Wenchen Fan  wrote:

> It will be great to add pure-SQL support to structured streaming. I think
> it goes without saying that how important SQL support is, but we should
> make a completed design first.
>
> Looking at the Kafka streaming syntax
> , it
> has CREATE STREAM, it has WINDOW TUMBLING. Shall we check other streaming
> systems with SQL support, and justify places where we are going to differ?
>
> We should also take into account the full lifecycle:
> 1. how to restart a streaming query from checkpoint?
> 2. how to stop a streaming query?
> 3. how to check status/progress of a streaming query?
> 4. ...
>
> Basically, we should check what functions the DataStreamReader/Writer API
> support, and see if we can support it with SQL as well.
>
>
> Thanks for your proposal!
> Wenchen
>
> On Mon, Oct 22, 2018 at 11:15 AM JackyLee  wrote:
>
>> The code of SQLStreaming has been pushed:
>>
>> https://github.com/apache/spark/pull/22575
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: Support SqlStreaming in spark

2018-12-21 Thread Wenchen Fan
It will be great to add pure-SQL support to structured streaming. I think
it goes without saying that how important SQL support is, but we should
make a completed design first.

Looking at the Kafka streaming syntax
, it
has CREATE STREAM, it has WINDOW TUMBLING. Shall we check other streaming
systems with SQL support, and justify places where we are going to differ?

We should also take into account the full lifecycle:
1. how to restart a streaming query from checkpoint?
2. how to stop a streaming query?
3. how to check status/progress of a streaming query?
4. ...

Basically, we should check what functions the DataStreamReader/Writer API
support, and see if we can support it with SQL as well.


Thanks for your proposal!
Wenchen

On Mon, Oct 22, 2018 at 11:15 AM JackyLee  wrote:

> The code of SQLStreaming has been pushed:
>
> https://github.com/apache/spark/pull/22575
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Support SqlStreaming in spark

2018-10-21 Thread JackyLee
The code of SQLStreaming has been pushed:

https://github.com/apache/spark/pull/22575



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Support SqlStreaming in spark

2018-06-28 Thread JackyLee
Spark JIRA:
https://issues.apache.org/jira/projects/SPARK/issues/SPARK-24630

Benefits:

Firstly, users, who are unfamiliar with streaming, can easily use SQL to run
StructStreaming especially when migrating offline tasks to real time
processing tasks.
Secondly, support SQL API in StructStreaming can also combine
StructStreaming with hive. Users can store the source/sink metadata in a
table and use hive metastore to manage it. The users, who want to read this
data, can easily create a stream by accessing the table, which can greatly
reduce the development cost and maintenance costs of StructStreaming.
Finally, easy to achieve unified management and authority control of source
and sink, and more controllable in the management of some private data,
especially in some financial or security area.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Support SqlStreaming in spark

2018-06-27 Thread Shixiong(Ryan) Zhu
Structured Streaming supports standard SQL as the batch queries, so the
users can switch their queries between batch and streaming easily. Could
you clarify what problems SqlStreaming solves and what are the benefits of
the new syntax?

Best Regards,
Ryan

On Thu, Jun 14, 2018 at 7:06 PM, JackyLee  wrote:

> Hello
>
> Nowadays, more and more streaming products begin to support SQL streaming,
> such as KafaSQL, Flink SQL and Storm SQL. To support SQL Streaming can not
> only reduce the threshold of streaming, but also make streaming easier to
> be
> accepted by everyone.
>
> At present, StructStreaming is relatively mature, and the StructStreaming
> is
> based on DataSet API, which make it possibal to  provide a SQL portal for
> structstreaming and run structstreaming in SQL.
>
> To support for SQL Streaming, there are two key points:
> 1, Analysis should be able to parse streaming type SQL.
> 2, Analyzer should be able to map metadata information to the corresponding
> Relation.
>
> Running StructStreaming in SQL can bring some benefits.
> 1, Reduce the entry threshold of StructStreaming and attract users more
> easily.
> 2, Encapsulate the meta information of source or sink into table, maintain
> and manage uniformly, and make users more accessible.
> 3. Metadata permissions management, which is based on hive, can control
> StructStreaming's overall authority management scheme more closely.
>
> We have found some ways to solve this problem. It's a pleasure to discuss
> it
> with you.
>
> Thanks,
>
> Jackey Lee
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Re: Support SqlStreaming in spark

2018-06-15 Thread Hadrien Chicault
Unsuscribe

2018-06-15 9:20 GMT+02:00 stc :

> The repo you give may solve some of SqlStreaming problems, but not
> friendly enough, user need to learn this new syntax.
>
> --
> Jacky Lee
> Mail:qcsd2...@163.com
>
> At 2018-06-15 11:48:01, "Bowden, Chris" 
> wrote:
>
> Not sure if there is a question in here, but if you are hinting that
> structured streaming should support a sql interface, spark has appropriate
> extensibility hooks to make it possible. However, the most powerful
> construct in structured streaming is quite difficult to find a sql
> equivalent for (e.g., flatMapGroupsWithState). This repo could use some
> cleanup but is an example of providing a sql interface to a subset of
> structured streaming's functionality: https://github.
> com/vertica/pstl/blob/master/pstl/src/main/antlr4/org/
> apache/spark/sql/catalyst/parser/pstl/PstlSqlBase.g4.
>
> --
> *From:* JackyLee 
> *Sent:* Thursday, June 14, 2018 7:06:17 PM
> *To:* dev@spark.apache.org
> *Subject:* Support SqlStreaming in spark
>
> Hello
>
> Nowadays, more and more streaming products begin to support SQL streaming,
> such as KafaSQL, Flink SQL and Storm SQL. To support SQL Streaming can not
> only reduce the threshold of streaming, but also make streaming easier to
> be
> accepted by everyone.
>
> At present, StructStreaming is relatively mature, and the StructStreaming
> is
> based on DataSet API, which make it possibal to  provide a SQL portal for
> structstreaming and run structstreaming in SQL.
>
> To support for SQL Streaming, there are two key points:
> 1, Analysis should be able to parse streaming type SQL.
> 2, Analyzer should be able to map metadata information to the corresponding
> Relation.
>
> Running StructStreaming in SQL can bring some benefits.
> 1, Reduce the entry threshold of StructStreaming and attract users more
> easily.
> 2, Encapsulate the meta information of source or sink into table, maintain
> and manage uniformly, and make users more accessible.
> 3. Metadata permissions management, which is based on hive, can control
> StructStreaming's overall authority management scheme more closely.
>
> We have found some ways to solve this problem. It's a pleasure to discuss
> it
> with you.
>
> Thanks,
>
> Jackey Lee
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re:Re: Support SqlStreaming in spark

2018-06-15 Thread stc
The repo you give may solve some of SqlStreaming problems, but not friendly 
enough, user need to learn this new syntax.


--

Jacky Lee
Mail:qcsd2...@163.com


At 2018-06-15 11:48:01, "Bowden, Chris"  wrote:


Not sure if there is a question in here, but if you are hinting that structured 
streaming should support a sql interface, spark has appropriate extensibility 
hooks to make it possible. However, the most powerful construct in structured 
streaming is quite difficult to find a sql equivalent for (e.g., 
flatMapGroupsWithState). This repo could use some cleanup but is an example of 
providing a sql interface to a subset of structured streaming's functionality: 
https://github.com/vertica/pstl/blob/master/pstl/src/main/antlr4/org/apache/spark/sql/catalyst/parser/pstl/PstlSqlBase.g4.



From: JackyLee 
Sent: Thursday, June 14, 2018 7:06:17 PM
To:dev@spark.apache.org
Subject: Support SqlStreaming in spark
 
Hello

Nowadays, more and more streaming products begin to support SQL streaming,
such as KafaSQL, Flink SQL and Storm SQL. To support SQL Streaming can not
only reduce the threshold of streaming, but also make streaming easier to be
accepted by everyone.

At present, StructStreaming is relatively mature, and the StructStreaming is
based on DataSet API, which make it possibal to  provide a SQL portal for
structstreaming and run structstreaming in SQL.

To support for SQL Streaming, there are two key points:
1, Analysis should be able to parse streaming type SQL.
2, Analyzer should be able to map metadata information to the corresponding
Relation.

Running StructStreaming in SQL can bring some benefits.
1, Reduce the entry threshold of StructStreaming and attract users more
easily.
2, Encapsulate the meta information of source or sink into table, maintain
and manage uniformly, and make users more accessible.
3. Metadata permissions management, which is based on hive, can control
StructStreaming's overall authority management scheme more closely.

We have found some ways to solve this problem. It's a pleasure to discuss it
with you.

Thanks, 

Jackey Lee



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Support SqlStreaming in spark

2018-06-14 Thread JackyLee
Hello 

Nowadays, more and more streaming products begin to support SQL streaming,
such as KafaSQL, Flink SQL and Storm SQL. To support SQL Streaming can not
only reduce the threshold of streaming, but also make streaming easier to be
accepted by everyone. 

At present, StructStreaming is relatively mature, and the StructStreaming is
based on DataSet API, which make it possibal to  provide a SQL portal for
structstreaming and run structstreaming in SQL. 

To support for SQL Streaming, there are two key points: 
1, Analysis should be able to parse streaming type SQL. 
2, Analyzer should be able to map metadata information to the corresponding
Relation. 

Running StructStreaming in SQL can bring some benefits. 
1, Reduce the entry threshold of StructStreaming and attract users more
easily. 
2, Encapsulate the meta information of source or sink into table, maintain
and manage uniformly, and make users more accessible. 
3. Metadata permissions management, which is based on hive, can control
StructStreaming's overall authority management scheme more closely. 

We have found some ways to solve this problem. It's a pleasure to discuss it
with you. 

Thanks,  

Jackey Lee



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org