Re: GitHub is out of order

2018-10-21 Thread Hyukjin Kwon
Yea.. please ignore my duplicated comments if they exist. I didn't know
it's globally happening but I thought a problem specific to me so I left
duplicated comments multiple times.

2018년 10월 22일 (월) 오후 12:40, Dongjoon Hyun 님이 작성:

> Hi, All.
>
> Currently, GitHub is out of order. Apache Spark repo is also affected.
> Newly filed pull requests to Apache Spark repository seem to disappear
> repeatedly, too.
>
> https://status.github.com/messages
>
> Bests,
> Dongjoon.
>


GitHub is out of order

2018-10-21 Thread Dongjoon Hyun
Hi, All.

Currently, GitHub is out of order. Apache Spark repo is also affected.
Newly filed pull requests to Apache Spark repository seem to disappear
repeatedly, too.

https://status.github.com/messages

Bests,
Dongjoon.


Re: queryable state & streaming

2018-10-21 Thread Jungtaek Lim
It doesn't seem Spark has workarounds other than storing output into
external storages, so +1 on having this.

My major concern on implementing queryable state in structured streaming is
"Are all states available on executors at any time while query is running?"
Querying state shouldn't affect the running query. Given that state is huge
and default state provider is loading state in memory, we may not want to
load one more redundant snapshot of state: we want to always load "current
state" which query is also using. (For sure, Queryable state should be
read-only.)

Regarding improvement of local state, I guess it is ideal to leverage
embedded db, like Kafka and Flink are doing. The difference will not be
only reading state from non-heap, but also how to take a snapshot and store
delta. We may want to check snapshotting works well with small batch
interval, and find alternative approach when it doesn't. Sounds like it is
a huge item and can be handled individually.

- Jungtaek Lim (HeartSaVioR)

2017년 12월 9일 (토) 오후 10:51, Stavros Kontopoulos 님이
작성:

> Nice I was looking for a jira. So I agree we should justify why we are
> building something. Now to that direction here is what I have seen from my
> experience.
> People quite often use state within their streaming app and may have large
> states (TBs). Shortening the pipeline by not having to copy data (to
> Cassandra for example for serving) is an advantage, in terms of at least
> latency and complexity.
> This can be true if we advantage of state checkpointing (locally could be
> RocksDB or in general HDFS the latter is currently supported)  along with
> an API to efficiently query data.
> Some use cases I see:
>
> - real-time dashboards and real-time reporting, the faster the better
> - monitoring of state for operational reasons, app health etc...
> - integrating with external services via an API eg. making accessible
>  aggregations over time windows to some third party service within your
> system
>
> Regarding requirements here are some of them:
> - support of an API to expose state (could be done at the spark driver),
> like rest.
> - supporting dynamic allocation (not sure how it affects state management)
> - an efficient way to talk to executors to get the state (rpc?)
> - making local state more efficient and easier accessible with an embedded
> db (I dont think this is supported from what I see, maybe wrong)?
> Some people are already working with such techs and some stuff could be
> re-used: https://issues.apache.org/jira/browse/SPARK-20641
>
> Best,
> Stavros
>
>
> On Fri, Dec 8, 2017 at 10:32 PM, Michael Armbrust 
> wrote:
>
>> https://issues.apache.org/jira/browse/SPARK-16738
>>
>> I don't believe anyone is working on it yet.  I think the most useful
>> thing is to start enumerating requirements and use cases and then we can
>> talk about how to build it.
>>
>> On Fri, Dec 8, 2017 at 10:47 AM, Stavros Kontopoulos <
>> st.kontopou...@gmail.com> wrote:
>>
>>> Cool Burak do you have a pointer, should I take the initiative for a
>>> first design document or Databricks is working on it?
>>>
>>> Best,
>>> Stavros
>>>
>>> On Fri, Dec 8, 2017 at 8:40 PM, Burak Yavuz  wrote:
>>>
 Hi Stavros,

 Queryable state is definitely on the roadmap! We will revamp the
 StateStore API a bit, and a queryable StateStore is definitely one of the
 things we are thinking about during that revamp.

 Best,
 Burak

 On Dec 8, 2017 9:57 AM, "Stavros Kontopoulos" 
 wrote:

> Just to re-phrase my question: Would query-able state make a viable
> SPIP?
>
> Regards,
> Stavros
>
> On Thu, Dec 7, 2017 at 1:34 PM, Stavros Kontopoulos <
> st.kontopou...@gmail.com> wrote:
>
>> Hi,
>>
>> Maybe this has been discussed before. Given the fact that many
>> streaming apps out there use state extensively, could be a good idea to
>> make Spark expose streaming state with an external API like other
>> systems do (Kafka streams, Flink etc), in order to facilitate
>> interactive queries?
>>
>> Regards,
>> Stavros
>>
>
>
>>>
>>
>


Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread Jungtaek Lim
Yeah, the main intention of this thread is to collect interest on possible
feature list for structured streaming. From what I can see in Spark
community, most of the discussions as well as contributions are for SQL,
and I'd wish to see similar activeness / efforts on structured streaming.
(Unfortunately there's less effort to review others' works - design doc as
well as pull request - most of efforts looks like being spent to their own
works.)

I respect the role of PMC member, so the final decision would be up to PMC
members, but contributors as well as end users could show the interest as
well as discuss about requirements on SPIP, which could be a good
background to persuade PMC members.

Before going into the deep I guess we could use this thread to discuss
about possible use cases, and if we would like to move forward to
individual thread we could initiate (or resurrect) its discussion thread.

For queryable state, at least there seems no workaround in Spark to provide
similar thing, especially state is getting bigger. I may have some concerns
on the details, but I'll add my thought on the discussion thread.

- Jungtaek Lim (HeartSaVioR)

2018년 10월 22일 (월) 오전 1:15, Stavros Kontopoulos <
stavros.kontopou...@lightbend.com>님이 작성:

> Hi Jungtaek,
>
> I just tried to start the discussion in the dev list along time ago.
> I enumerated some uses cases as Michael proposed here
> .
> The discussion didn't go further.
>
> If people find it useful we should start discussing it in detail again.
>
> Stavros
>
> On Sun, Oct 21, 2018 at 4:54 PM, Jungtaek Lim  wrote:
>
>> Stavros, if my memory is right, you were trying to drive queryable state,
>> right?
>>
>> Could you summary the progress and the reason why the progress got
>> stopped?
>>
>> 2018년 10월 21일 (일) 오후 10:27, Stavros Kontopoulos <
>> stavros.kontopou...@lightbend.com>님이 작성:
>>
>>> That is a very interesting list thanks. I could create a design doc as
>>> a starting pointing for discussion if this is a feature we would like to
>>> have.
>>>
>>> Regards,
>>> Stavros
>>>
>>> On Sun, Oct 21, 2018 at 3:04 PM, JackyLee  wrote:
>>>
 Thanks for raising them.

 FYI, I believe this open issues could also be considered:

 https://issues.apache.org/jira/browse/SPARK-24630
 

 An new ability to express Struct Streaming on pure SQL.



 --
 Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>>
>>>
>>>
>>>
>
>


Re: Support SqlStreaming in spark

2018-10-21 Thread JackyLee
The code of SQLStreaming has been pushed:

https://github.com/apache/spark/pull/22575



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: data source api v2 refactoring

2018-10-21 Thread JackyLee
I have pushed a patch for SQLStreaming, which just resolved the problem just
discussed.
the Jira:
https://issues.apache.org/jira/browse/SPARK-24630
the Patch:
https://github.com/apache/spark/pull/22575

SQLStreaming just defined the table API for StructStreaming, and the Table
APIs for Streaming and batch are are fully compatible. 

With SQLStreaming, we can create a streaming just like this:
val table = spark.createTable()
spark.table(temp)



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread Stavros Kontopoulos
Hi Jungtaek,

I just tried to start the discussion in the dev list along time ago.
I enumerated some uses cases as Michael proposed here
.
The discussion didn't go further.

If people find it useful we should start discussing it in detail again.

Stavros

On Sun, Oct 21, 2018 at 4:54 PM, Jungtaek Lim  wrote:

> Stavros, if my memory is right, you were trying to drive queryable state,
> right?
>
> Could you summary the progress and the reason why the progress got stopped?
>
> 2018년 10월 21일 (일) 오후 10:27, Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com>님이 작성:
>
>> That is a very interesting list thanks. I could create a design doc as a
>> starting pointing for discussion if this is a feature we would like to have.
>>
>> Regards,
>> Stavros
>>
>> On Sun, Oct 21, 2018 at 3:04 PM, JackyLee  wrote:
>>
>>> Thanks for raising them.
>>>
>>> FYI, I believe this open issues could also be considered:
>>>
>>> https://issues.apache.org/jira/browse/SPARK-24630
>>> 
>>>
>>> An new ability to express Struct Streaming on pure SQL.
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>>
>>
>>


Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread Jungtaek Lim
Stavros, if my memory is right, you were trying to drive queryable state,
right?

Could you summary the progress and the reason why the progress got stopped?

2018년 10월 21일 (일) 오후 10:27, Stavros Kontopoulos <
stavros.kontopou...@lightbend.com>님이 작성:

> That is a very interesting list thanks. I could create a design doc as a
> starting pointing for discussion if this is a feature we would like to have.
>
> Regards,
> Stavros
>
> On Sun, Oct 21, 2018 at 3:04 PM, JackyLee  wrote:
>
>> Thanks for raising them.
>>
>> FYI, I believe this open issues could also be considered:
>>
>> https://issues.apache.org/jira/browse/SPARK-24630
>> 
>>
>> An new ability to express Struct Streaming on pure SQL.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
>
>
>


Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread Stavros Kontopoulos
That is a very interesting list thanks. I could create a design doc as a
starting pointing for discussion if this is a feature we would like to have.

Regards,
Stavros

On Sun, Oct 21, 2018 at 3:04 PM, JackyLee  wrote:

> Thanks for raising them.
>
> FYI, I believe this open issues could also be considered:
>
> https://issues.apache.org/jira/browse/SPARK-24630
> 
>
> An new ability to express Struct Streaming on pure SQL.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Plan on Structured Streaming in next major/minor release?

2018-10-21 Thread JackyLee
Thanks for raising them.

FYI, I believe this open issues could also be considered:

https://issues.apache.org/jira/browse/SPARK-24630
  

An new ability to express Struct Streaming on pure SQL. 



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org