Err - "Hi Gary"!
On Sat, Oct 26, 2013 at 10:14 PM, Patrick Wendell <[email protected]>wrote: > Hey Rohit, > > A single SparkContext can be used to read and write files of different > formats, including HDFS or cassandra. For instance you could do this: > > rdd1 = sc.textFile(XXX) // Some text file in HDFS > rdd1.saveAsHadoopFile(.., classOf[ColumnFamilyOutputFormat], ...) // Save > into a cassandra file (see Cassandra example) > > This is a common pattern when using Spark for ETL between different > storage systems. > > - Patrick > > > On Sat, Oct 26, 2013 at 7:31 PM, Gary Malouf <[email protected]>wrote: > >> Hi Rohit, >> >> We are big users of the Spark Shell - it is used by our analytics team >> for the same purposes that Hive used to be. The SparkContext which is >> provided at startup I guess would have to be one of HDFS or Cassandra - I >> take it we would then manually create a second context? >> >> Thanks, >> >> Gary >> >> >> On Sat, Oct 26, 2013 at 1:07 PM, Rohit Rai <[email protected]> wrote: >> >>> Hello Gary, >>> >>> This is very easy to do. You can read your data from HDFS using >>> FileInputFormat, transform it to a desired rows and write to Cassandra >>> using ColumnFamilyInputFormat. >>> >>> Our library called Calliope (Apache Licensed), >>> http://tuplejump.github.io/calliope/ can make the task of writing to C* >>> easier. >>> >>> >>> In case you don't want to convert it to rows and keep them as files in >>> Cassandra, our lightweight Cassandra backed HDFS compatible filesystem, >>> SnackFS can help you. SnackFS will be part of next Calliope release later >>> this month, but we can provide you access if you would like to try it out. >>> >>> Feel free to mail me directly in case you need any assistance. >>> >>> >>> Regards, >>> Rohit >>> founder @ tuplejump >>> >>> >>> >>> >>> On Sat, Oct 26, 2013 at 5:45 AM, Gary Malouf <[email protected]>wrote: >>> >>>> We have a use case in which much of our raw data is stored in HDFS >>>> today. We'd like to write our Spark jobs such that they read/aggregate >>>> data from HDFS and can output to our Cassandra cluster. >>>> >>>> Is there any way of doing this in spark 0.7.3? >>>> >>> >>> >>> >>> -- >>> >>> ____________________________ >>> www.tuplejump.com >>> *The Data Engineering Platform* >>> >> >> >
