Folks: SnappyData.
I’m fairly new to working with it myself, but it looks pretty promising. It marries Spark with a co-located in-memory GemFire (or something gem-related) database. So you can access the data with SQL, JDBC, ODBC (if you wanna go Enterprise instead of open-source) or natively as mutable RDDs and DataFrames. You can run it so the storage and Spark compute are co-located in the same JVM on each machine, so you get data locality instead of a bottleneck between load, save, and compute. The data is supposed to persist between applications, cluster startups, or multiple applications doing stuff to the data at the same time. I hope it works for what I’m doing and isn’t too buggy. But it looks pretty good. —Joe Pride > On Oct 31, 2017, at 11:14 AM, Gene Pang <gene.p...@gmail.com> wrote: > > Hi, > > Alluxio enables sharing dataframes across different applications. This blog > post talks about dataframes and Alluxio, and this Spark Summit presentation > has additional information. > > Thanks, > Gene > >> On Tue, Oct 31, 2017 at 6:04 PM, Revin Chalil <rcha...@expedia.com> wrote: >> Any info on the below will be really appreciated. >> >> >> >> I read about Alluxio and Ignite. Has anybody used any of them? Do they work >> well with multiple Apps doing lookups simultaneously? Are there better >> options? Thank you. >> >> >> >> From: roshan joe <impdocs2...@gmail.com> >> Date: Monday, October 30, 2017 at 7:53 PM >> To: "user@spark.apache.org" <user@spark.apache.org> >> Subject: share datasets across multiple spark-streaming applications for >> lookup >> >> >> >> Hi, >> >> >> >> What is the recommended way to share datasets across multiple >> spark-streaming applications, so that the incoming data can be looked up >> against this shared dataset? >> >> >> >> The shared dataset is also incrementally refreshed and stored on S3. Below >> is the scenario. >> >> >> >> Streaming App-1 consumes data from Source-1 and writes to DS-1 in S3. >> >> Streaming App-2 consumes data from Source-2 and writes to DS-2 in S3. >> >> >> >> >> Streaming App-3 consumes data from Source-3, needs to lookup against DS-1 >> and DS-2 and write to DS-3 in S3. >> >> Streaming App-4 consumes data from Source-4, needs to lookup against DS-1 >> and DS-2 and write to DS-3 in S3. >> >> Streaming App-n consumes data from Source-n, needs to lookup against DS-1 >> and DS-2 and write to DS-n in S3. >> >> >> >> So DS-1 and DS-2 ideally should be shared for lookup across multiple >> streaming apps. Any input is appreciated. Thank you! >> >