Honestly, I think what they did re Brisk development is fair. They left the code for any of us in the community to improve it and make it compatible with newer versions and they need to make money as a company as well. They already contribute so much to the Cassandra community in general and they are certainly not trying to stop people from continuing to develop Brisk. Hadoop jobs that input and output to Cassandra will also work without it as well. If you need the features of CFS and don’t want to maintain HDFS then yes you'll have to pay for DSE.
If you are having issues with data not on the particular node that you are reading from with Hadoop I'd go ahead and set the consistency level in your job configuration as I recommended previously. Note there is also a cassandra.consistencylevel.write setting if you are using either the ColumnFamilyOutputFormat or BulkOutputFormat classes. In terms of performance I have a MR job that reads in 30 million rows with QUORUM consistency on 1.1.6 with RandomPartitioner and the mapper takes about 11 minutes across 3 Hadoop nodes (our Cassandra cluster is obviously larger but we haven't fully scaled out our Hadoop cluster yet). Hardware is 2 7200 rpm drives + SSD for the commit log, 32GB of RAM, and 12 cores per node. Hope this helps. Best, michael On 10/18/12 12:24 PM, "Jean-Nicolas Boulay Desjardins" <jnbdzjn...@gmail.com> wrote: >I am surprise that it was abandoned this way. So if I want to use >Brisk on Cassandra 1.1 I have to use DataStax Entreprise service... > >On Thu, Oct 18, 2012 at 3:00 PM, Michael Kjellman ><mkjell...@barracuda.com> wrote: >> >> Unless you have Brisk (however as far as I know there was one fork that >> got it working on 1.0 but nothing for 1.1 and is not being actively >> maintained by Datastax) or go with CFS (which comes with DSE) you are >>not >> guaranteed all data is on that hadoop node. You can take a look at the >>forks >> if interested here: https://github.com/riptano/brisk/network but I'd >> personally be afraid to put my eggs in a basket that is certainly not >>super >> supported anymore. >> >> job.getConfiguration().set("cassandra.consistencylevel.read", "QUORUM"); >> should get you started. >> >> >> Best, >> >> michael >> >> >> >> From: Jean-Nicolas Boulay Desjardins <jnbdzjn...@gmail.com> >> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Date: Thursday, October 18, 2012 11:49 AM >> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: Re: hadoop consistency level >> >> Why don't you look into Brisk: >> http://www.datastax.com/docs/0.8/brisk/about_brisk >> >> On Thu, Oct 18, 2012 at 2:46 PM, Andrey Ilinykh <ailin...@gmail.com> >> wrote: >>> >>> Hello, everybody! >>> I'm thinking about running hadoop jobs on the top of the cassandra >>> cluster. My understanding is - hadoop jobs read data from local nodes >>> only. Does it mean the consistency level is always ONE? >>> >>> Thank you, >>> Andrey >> >> >> >> ---------------------------------- >> 'Like' us on Facebook for exclusive content and other resources on all >> Barracuda Networks solutions. >> Visit http://barracudanetworks.com/facebook >> 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions. Visit http://barracudanetworks.com/facebook