Hey Rohit, A single SparkContext can be used to read and write files of different formats, including HDFS or cassandra. For instance you could do this:
rdd1 = sc.textFile(XXX) // Some text file in HDFS rdd1.saveAsHadoopFile(.., classOf[ColumnFamilyOutputFormat], ...) // Save into a cassandra file (see Cassandra example) This is a common pattern when using Spark for ETL between different storage systems. - Patrick On Sat, Oct 26, 2013 at 7:31 PM, Gary Malouf <[email protected]> wrote: > Hi Rohit, > > We are big users of the Spark Shell - it is used by our analytics team for > the same purposes that Hive used to be. The SparkContext which is provided > at startup I guess would have to be one of HDFS or Cassandra - I take it we > would then manually create a second context? > > Thanks, > > Gary > > > On Sat, Oct 26, 2013 at 1:07 PM, Rohit Rai <[email protected]> wrote: > >> Hello Gary, >> >> This is very easy to do. You can read your data from HDFS using >> FileInputFormat, transform it to a desired rows and write to Cassandra >> using ColumnFamilyInputFormat. >> >> Our library called Calliope (Apache Licensed), >> http://tuplejump.github.io/calliope/ can make the task of writing to C* >> easier. >> >> >> In case you don't want to convert it to rows and keep them as files in >> Cassandra, our lightweight Cassandra backed HDFS compatible filesystem, >> SnackFS can help you. SnackFS will be part of next Calliope release later >> this month, but we can provide you access if you would like to try it out. >> >> Feel free to mail me directly in case you need any assistance. >> >> >> Regards, >> Rohit >> founder @ tuplejump >> >> >> >> >> On Sat, Oct 26, 2013 at 5:45 AM, Gary Malouf <[email protected]>wrote: >> >>> We have a use case in which much of our raw data is stored in HDFS >>> today. We'd like to write our Spark jobs such that they read/aggregate >>> data from HDFS and can output to our Cassandra cluster. >>> >>> Is there any way of doing this in spark 0.7.3? >>> >> >> >> >> -- >> >> ____________________________ >> www.tuplejump.com >> *The Data Engineering Platform* >> > >
