Hector is a more industrial-strength client for Cassandra. I have not used it.
https://github.com/rantav/hector On Sat, Dec 31, 2011 at 10:50 AM, Sean Owen <[email protected]> wrote: > You might get some mileage out of this article I wrote about using > Cassandra as input for Hadoop/Mahout, though it's not specific to LDA: > > http://www.acunu.com/blogs/sean-owen/scaling-cassandra-and-mahout-hadoop/ > > On Sat, Dec 31, 2011 at 10:36 AM, Allen <[email protected]> wrote: > >> Hello there, >> >> I am new to Mahout and trying to get Mahout running on our data >> storage -- Cassandra. After poking around the LDA example on reuters >> data, I have several questions. >> >> 1) Where is the source code for seqdirectory and seq2sparse? >> >> 2) Before the algorithm can run, it looks like the raw text must be >> converted and materialized into a sequece file which represents some >> vectors. Is that true? If so, is there an more efficient way to handle >> the conversion like streaming the data? In my project, all the data is >> in Cassandra. If I need to run some Mahout algorithm, it seems I need >> to get the data out, put them into a temporal directory in HDFS, >> convert them into sequence file and finally turn them into tf-vectors >> format in HDFS. Then I can run the algorithm. 2 temporal data are >> stored in the above procedure which will make the run slow. >> >> Many thanks. >> >> -- >> Allen >> -- Lance Norskog [email protected]
