Re: SnappyData and Structured Streaming

2016-07-06 Thread Benjamin Kim
Jags,

Thanks for the details. This makes things much clearer. I saw in the Spark 
roadmap that version 2.1 will add the SQL capabilities mentioned here. It looks 
like, gradually, the Spark community is coming to the same conclusions that the 
SnappyData folks have come to a while back in terms of Streaming. But, there is 
always the need for a better way to store data underlying Spark. The State 
Store information was informative too. I can envision that it can use this data 
store too if need be.

Thanks again,
Ben

> On Jul 6, 2016, at 8:52 AM, Jags Ramnarayan  wrote:
> 
> The plan is to fully integrate with the new structured streaming API and 
> implementation in an upcoming release. But, we will continue offering several 
> extensions. Few noted below ...
> 
> - the store (streaming sink) will offer a lot more capabilities like 
> transactions, replicated tables, partitioned row and column oriented tables 
> to suit different types of workloads. 
> - While streaming API(scala) in snappydata itself will change a bit to become 
> fully compatible with structured streaming(SchemaDStream will go away), we 
> will continue to offer SQL support for streams so they can be managed from 
> external clients (JDBC, ODBC), their partitions can share the same 
> partitioning strategy as the underlying table where it might be stored, and 
> even registrations of continuous queries from remote clients. 
> 
> While building streaming apps using the Spark APi offers tremendous 
> flexibility we also want to make it simple for apps to work with streams just 
> using SQL. For instance, you should be able to declaratively specify a table 
> as a sink to a stream(i.e. using SQL). For example, you can specify a "TopK 
> Table" (a built in special table for topK analytics using probabilistic data 
> structures) as a sink for a high velocity time series stream like this - 
> "create topK table MostPopularTweets on tweetStreamTable " +
> "options(key 'hashtag', frequencyCol 'retweets', timeSeriesColumn 
> 'tweetTime' )" 
> where 'tweetStreamTable' is created using the 'create stream table ...' SQL 
> syntax. 
> 
> 
> -
> Jags
> SnappyData blog 
> Download binary, source 
> 
> 
> On Wed, Jul 6, 2016 at 8:02 PM, Benjamin Kim  > wrote:
> Jags,
> 
> I should have been more specific. I am referring to what I read at 
> http://snappydatainc.github.io/snappydata/streamingWithSQL/ 
> , especially the 
> Streaming Tables part. It roughly coincides with the Streaming DataFrames 
> outlined here 
> https://docs.google.com/document/d/1NHKdRSNCbCmJbinLmZuqNA1Pt6CGpFnLVRbzuDUcZVM/edit#heading=h.ff0opfdo6q1h
>  
> .
>  I don’t if I’m wrong, but they both sound very similar. That’s why I posed 
> this question.
> 
> Thanks,
> Ben
> 
>> On Jul 6, 2016, at 7:03 AM, Jags Ramnarayan > > wrote:
>> 
>> Ben,
>>Note that Snappydata's primary objective is to be a distributed in-memory 
>> DB for mixed workloads (i.e. streaming with transactions and analytic 
>> queries). On the other hand, Spark, till date, is primarily designed as a 
>> processing engine over myriad storage engines (SnappyData being one). So, 
>> the marriage is quite complementary. The difference compared to other stores 
>> is that SnappyData realizes its solution by deeply integrating and 
>> collocating with Spark (i.e. share spark executor memory/resources with the 
>> store) avoiding serializations and shuffle in many situations.
>> 
>> On your specific thought about being similar to Structured streaming, a 
>> better discussion could be a comparison to the recently introduced State 
>> store 
>> 
>>  (perhaps this is what you meant). 
>> It proposes a KV store for streaming aggregations with support for updates. 
>> The proposed API will, at some point, be pluggable so vendors can easily 
>> support alternate implementations to storage, not just HDFS(default store in 
>> proposed State store). 
>> 
>> 
>> -
>> Jags
>> SnappyData blog 
>> Download binary, source 
>> 
>> 
>> On Wed, Jul 6, 2016 at 12:49 AM, Benjamin Kim > > wrote:
>> I recently got a sales email from SnappyData, and after reading the 
>> documentation about what they offer, it sounds very similar to what 
>> Structured Streaming will offer w/o the underlying in-memory, spill-to-disk, 
>> CRUD compliant data storage in SnappyData. I was wondering if Structured 
>> Streaming is 

Re: SnappyData and Structured Streaming

2016-07-06 Thread Jags Ramnarayan
The plan is to fully integrate with the new structured streaming API and
implementation in an upcoming release. But, we will continue offering
several extensions. Few noted below ...

- the store (streaming sink) will offer a lot more capabilities like
transactions, replicated tables, partitioned row and column oriented tables
to suit different types of workloads.
- While streaming API(scala) in snappydata itself will change a bit to
become fully compatible with structured streaming(SchemaDStream will go
away), we will continue to offer SQL support for streams so they can be
managed from external clients (JDBC, ODBC), their partitions can share the
same partitioning strategy as the underlying table where it might be
stored, and even registrations of continuous queries from remote clients.

While building streaming apps using the Spark APi offers tremendous
flexibility we also want to make it simple for apps to work with streams
just using SQL. For instance, you should be able to declaratively specify a
table as a sink to a stream(i.e. using SQL). For example, you can specify a
"TopK Table" (a built in special table for topK analytics using
probabilistic data structures) as a sink for a high velocity time series
stream like this - "create topK table MostPopularTweets on tweetStreamTable "
+
"options(key 'hashtag', frequencyCol 'retweets', timeSeriesColumn
'tweetTime' )"
where 'tweetStreamTable' is created using the 'create stream table ...' SQL
syntax.


-
Jags
SnappyData blog 
Download binary, source 


On Wed, Jul 6, 2016 at 8:02 PM, Benjamin Kim  wrote:

> Jags,
>
> I should have been more specific. I am referring to what I read at
> http://snappydatainc.github.io/snappydata/streamingWithSQL/, especially
> the Streaming Tables part. It roughly coincides with the Streaming
> DataFrames outlined here
> https://docs.google.com/document/d/1NHKdRSNCbCmJbinLmZuqNA1Pt6CGpFnLVRbzuDUcZVM/edit#heading=h.ff0opfdo6q1h.
> I don’t if I’m wrong, but they both sound very similar. That’s why I posed
> this question.
>
> Thanks,
> Ben
>
> On Jul 6, 2016, at 7:03 AM, Jags Ramnarayan 
> wrote:
>
> Ben,
>Note that Snappydata's primary objective is to be a distributed
> in-memory DB for mixed workloads (i.e. streaming with transactions and
> analytic queries). On the other hand, Spark, till date, is primarily
> designed as a processing engine over myriad storage engines (SnappyData
> being one). So, the marriage is quite complementary. The difference
> compared to other stores is that SnappyData realizes its solution by deeply
> integrating and collocating with Spark (i.e. share spark executor
> memory/resources with the store) avoiding serializations and shuffle in
> many situations.
>
> On your specific thought about being similar to Structured streaming, a
> better discussion could be a comparison to the recently introduced State
> store
> 
>  (perhaps
> this is what you meant).
> It proposes a KV store for streaming aggregations with support for
> updates. The proposed API will, at some point, be pluggable so vendors can
> easily support alternate implementations to storage, not just HDFS(default
> store in proposed State store).
>
>
> -
> Jags
> SnappyData blog 
> Download binary, source 
>
>
> On Wed, Jul 6, 2016 at 12:49 AM, Benjamin Kim  wrote:
>
>> I recently got a sales email from SnappyData, and after reading the
>> documentation about what they offer, it sounds very similar to what
>> Structured Streaming will offer w/o the underlying in-memory,
>> spill-to-disk, CRUD compliant data storage in SnappyData. I was wondering
>> if Structured Streaming is trying to achieve the same on its own or is
>> SnappyData contributing Streaming extensions that they built to the Spark
>> project. Lastly, what does the Spark community think of this so-called
>> “Spark Data Store”?
>>
>> Thanks,
>> Ben
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>


Re: SnappyData and Structured Streaming

2016-07-06 Thread Benjamin Kim
Jags,

I should have been more specific. I am referring to what I read at 
http://snappydatainc.github.io/snappydata/streamingWithSQL/, especially the 
Streaming Tables part. It roughly coincides with the Streaming DataFrames 
outlined here 
https://docs.google.com/document/d/1NHKdRSNCbCmJbinLmZuqNA1Pt6CGpFnLVRbzuDUcZVM/edit#heading=h.ff0opfdo6q1h.
 I don’t if I’m wrong, but they both sound very similar. That’s why I posed 
this question.

Thanks,
Ben

> On Jul 6, 2016, at 7:03 AM, Jags Ramnarayan  wrote:
> 
> Ben,
>Note that Snappydata's primary objective is to be a distributed in-memory 
> DB for mixed workloads (i.e. streaming with transactions and analytic 
> queries). On the other hand, Spark, till date, is primarily designed as a 
> processing engine over myriad storage engines (SnappyData being one). So, the 
> marriage is quite complementary. The difference compared to other stores is 
> that SnappyData realizes its solution by deeply integrating and collocating 
> with Spark (i.e. share spark executor memory/resources with the store) 
> avoiding serializations and shuffle in many situations.
> 
> On your specific thought about being similar to Structured streaming, a 
> better discussion could be a comparison to the recently introduced State 
> store 
> 
>  (perhaps this is what you meant). 
> It proposes a KV store for streaming aggregations with support for updates. 
> The proposed API will, at some point, be pluggable so vendors can easily 
> support alternate implementations to storage, not just HDFS(default store in 
> proposed State store). 
> 
> 
> -
> Jags
> SnappyData blog 
> Download binary, source 
> 
> 
> On Wed, Jul 6, 2016 at 12:49 AM, Benjamin Kim  > wrote:
> I recently got a sales email from SnappyData, and after reading the 
> documentation about what they offer, it sounds very similar to what 
> Structured Streaming will offer w/o the underlying in-memory, spill-to-disk, 
> CRUD compliant data storage in SnappyData. I was wondering if Structured 
> Streaming is trying to achieve the same on its own or is SnappyData 
> contributing Streaming extensions that they built to the Spark project. 
> Lastly, what does the Spark community think of this so-called “Spark Data 
> Store”?
> 
> Thanks,
> Ben
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> 
> 
> 



Re: SnappyData and Structured Streaming

2016-07-06 Thread Jags Ramnarayan
Ben,
   Note that Snappydata's primary objective is to be a distributed
in-memory DB for mixed workloads (i.e. streaming with transactions and
analytic queries). On the other hand, Spark, till date, is primarily
designed as a processing engine over myriad storage engines (SnappyData
being one). So, the marriage is quite complementary. The difference
compared to other stores is that SnappyData realizes its solution by deeply
integrating and collocating with Spark (i.e. share spark executor
memory/resources with the store) avoiding serializations and shuffle in
many situations.

On your specific thought about being similar to Structured streaming, a
better discussion could be a comparison to the recently introduced State
store

(perhaps
this is what you meant).
It proposes a KV store for streaming aggregations with support for updates.
The proposed API will, at some point, be pluggable so vendors can easily
support alternate implementations to storage, not just HDFS(default store
in proposed State store).


-
Jags
SnappyData blog 
Download binary, source 


On Wed, Jul 6, 2016 at 12:49 AM, Benjamin Kim  wrote:

> I recently got a sales email from SnappyData, and after reading the
> documentation about what they offer, it sounds very similar to what
> Structured Streaming will offer w/o the underlying in-memory,
> spill-to-disk, CRUD compliant data storage in SnappyData. I was wondering
> if Structured Streaming is trying to achieve the same on its own or is
> SnappyData contributing Streaming extensions that they built to the Spark
> project. Lastly, what does the Spark community think of this so-called
> “Spark Data Store”?
>
> Thanks,
> Ben
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


SnappyData and Structured Streaming

2016-07-05 Thread Benjamin Kim
I recently got a sales email from SnappyData, and after reading the 
documentation about what they offer, it sounds very similar to what Structured 
Streaming will offer w/o the underlying in-memory, spill-to-disk, CRUD 
compliant data storage in SnappyData. I was wondering if Structured Streaming 
is trying to achieve the same on its own or is SnappyData contributing 
Streaming extensions that they built to the Spark project. Lastly, what does 
the Spark community think of this so-called “Spark Data Store”?

Thanks,
Ben
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org