Folks:

SnappyData.

I’m fairly new to working with it myself, but it looks pretty promising. It 
marries Spark with a co-located in-memory GemFire (or something gem-related) 
database. So you can access the data with SQL, JDBC, ODBC (if you wanna go 
Enterprise instead of open-source) or natively as mutable RDDs and DataFrames.

You can run it so the storage and Spark compute are co-located in the same JVM 
on each machine, so you get data locality instead of a bottleneck between load, 
save, and compute. The data is supposed to persist between applications, 
cluster startups, or multiple applications doing stuff to the data at the same 
time.

I hope it works for what I’m doing and isn’t too buggy. But it looks pretty 
good.

—Joe Pride

> On Oct 31, 2017, at 11:14 AM, Gene Pang <gene.p...@gmail.com> wrote:
> 
> Hi,
> 
> Alluxio enables sharing dataframes across different applications. This blog 
> post talks about dataframes and Alluxio, and this Spark Summit presentation 
> has additional information.
> 
> Thanks,
> Gene
> 
>> On Tue, Oct 31, 2017 at 6:04 PM, Revin Chalil <rcha...@expedia.com> wrote:
>> Any info on the below will be really appreciated.
>> 
>>  
>> 
>> I read about Alluxio and Ignite. Has anybody used any of them? Do they work 
>> well with multiple Apps doing lookups simultaneously? Are there better 
>> options? Thank you.  
>> 
>>  
>> 
>> From: roshan joe <impdocs2...@gmail.com>
>> Date: Monday, October 30, 2017 at 7:53 PM
>> To: "user@spark.apache.org" <user@spark.apache.org>
>> Subject: share datasets across multiple spark-streaming applications for 
>> lookup
>> 
>>  
>> 
>> Hi, 
>> 
>>  
>> 
>> What is the recommended way to share datasets across multiple 
>> spark-streaming applications, so that the incoming data can be looked up 
>> against this shared dataset? 
>> 
>>  
>> 
>> The shared dataset is also incrementally refreshed and stored on S3. Below 
>> is the scenario. 
>> 
>>  
>> 
>> Streaming App-1 consumes data from Source-1 and writes to DS-1 in S3. 
>> 
>> Streaming App-2 consumes data from Source-2 and writes to DS-2 in S3. 
>> 
>>  
>> 
>> 
>> Streaming App-3 consumes data from Source-3, needs to lookup against DS-1 
>> and DS-2 and write to DS-3 in S3. 
>> 
>> Streaming App-4 consumes data from Source-4, needs to lookup against DS-1 
>> and DS-2 and write to DS-3 in S3. 
>> 
>> Streaming App-n consumes data from Source-n, needs to lookup against DS-1 
>> and DS-2 and write to DS-n in S3.
>> 
>>  
>> 
>> So DS-1 and DS-2 ideally should be shared for lookup across multiple 
>> streaming apps. Any input is appreciated. Thank you!
>> 
> 

Reply via email to