Hello Gary, This is very easy to do. You can read your data from HDFS using FileInputFormat, transform it to a desired rows and write to Cassandra using ColumnFamilyInputFormat.
Our library called Calliope (Apache Licensed), http://tuplejump.github.io/calliope/ can make the task of writing to C* easier. In case you don't want to convert it to rows and keep them as files in Cassandra, our lightweight Cassandra backed HDFS compatible filesystem, SnackFS can help you. SnackFS will be part of next Calliope release later this month, but we can provide you access if you would like to try it out. Feel free to mail me directly in case you need any assistance. Regards, Rohit founder @ tuplejump On Sat, Oct 26, 2013 at 5:45 AM, Gary Malouf <[email protected]> wrote: > We have a use case in which much of our raw data is stored in HDFS today. > We'd like to write our Spark jobs such that they read/aggregate data from > HDFS and can output to our Cassandra cluster. > > Is there any way of doing this in spark 0.7.3? > -- ____________________________ www.tuplejump.com *The Data Engineering Platform*
