Re: Spark integration with HDFS and Cassandra simultaneously

Rohit Rai Sat, 26 Oct 2013 10:08:58 -0700

Hello Gary,

This is very easy to do. You can read your data from HDFS using
FileInputFormat, transform it to a desired rows and write to Cassandra
using ColumnFamilyInputFormat.

Our library called Calliope (Apache Licensed),
http://tuplejump.github.io/calliope/ can make the task of writing to C*
easier.

In case you don't want to convert it to rows and keep them as files in
Cassandra, our lightweight Cassandra backed HDFS compatible filesystem,
SnackFS can help you. SnackFS will be part of next Calliope release later
this month, but we can provide you access if you would like to try it out.

Feel free to mail me directly in case you need any assistance.

Regards,
Rohit
founder @ tuplejump

On Sat, Oct 26, 2013 at 5:45 AM, Gary Malouf <[email protected]> wrote:

> We have a use case in which much of our raw data is stored in HDFS today.
>  We'd like to write our Spark jobs such that they read/aggregate data from
> HDFS and can output to our Cassandra cluster.
>
> Is there any way of doing this in spark 0.7.3?
>

-- 

____________________________
www.tuplejump.com
*The Data Engineering Platform*

Re: Spark integration with HDFS and Cassandra simultaneously

Reply via email to