Re: Spark integration with HDFS and Cassandra simultaneously

Gary Malouf Sat, 26 Oct 2013 19:32:23 -0700

Hi Rohit,

We are big users of the Spark Shell - it is used by our analytics team for
the same purposes that Hive used to be.  The SparkContext which is provided
at startup I guess would have to be one of HDFS or Cassandra - I take it we
would then manually create a second context?


Thanks,

Gary


On Sat, Oct 26, 2013 at 1:07 PM, Rohit Rai <[email protected]> wrote:

> Hello Gary,
>
> This is very easy to do. You can read your data from HDFS using
> FileInputFormat, transform it to a desired rows and write to Cassandra
> using ColumnFamilyInputFormat.
>
> Our library called Calliope (Apache Licensed),
> http://tuplejump.github.io/calliope/ can make the task of writing to C*
> easier.
>
>
> In case you don't want to convert it to rows and keep them as files in
> Cassandra, our lightweight Cassandra backed HDFS compatible filesystem,
> SnackFS can help you. SnackFS will be part of next Calliope release later
> this month, but we can provide you access if you would like to try it out.
>
> Feel free to mail me directly in case you need any assistance.
>
>
> Regards,
> Rohit
> founder @ tuplejump
>
>
>
>
> On Sat, Oct 26, 2013 at 5:45 AM, Gary Malouf <[email protected]>wrote:
>
>> We have a use case in which much of our raw data is stored in HDFS today.
>>  We'd like to write our Spark jobs such that they read/aggregate data from
>> HDFS and can output to our Cassandra cluster.
>>
>> Is there any way of doing this in spark 0.7.3?
>>
>
>
>
> --
>
> ____________________________
> www.tuplejump.com
> *The Data Engineering Platform*
>

Re: Spark integration with HDFS and Cassandra simultaneously

Reply via email to