You could look at using Cassandra for storage. Spark integrates nicely with 
Cassandra, and a combination of Spark + Cassandra would give you fast access to 
structured data in Cassandra, while enabling analytic scenarios via Spark. 
Cassandra would take care of the replication, as it's one of the core features 
of the database.

Date: Sat, 24 Jan 2015 23:34:15 +0200
Subject: Full per node replication level (architecture question)
From: dev.ma...@gmail.com
To: u...@spark.incubator.apache.org

Hi,
I wonder whether any of the file systems supported by Spark, may well support a 
replication level whereby each node has a full copy of the data. I realize this 
was not the main intended scenario of spark/hadoop, but may be a good fit for a 
compute cluster that needs to be very fast over its input data, and that has 
data only in the amount of few terabytes in total (which fit nicely on any 
commodity disk and soon on any SSD).
It would be nice to use Spark map-reduce over the data, and enjoy automatic 
replication.
It would be also nice to assume Spark can seamlessly manage a job's workflow 
across such cluster...
Thanks!Matan                                      

Reply via email to