Re: hadoop consistency level

Michael Kjellman Thu, 18 Oct 2012 12:42:22 -0700

Honestly, I think what they did re Brisk development is fair. They left
the code for any of us in the community to improve it and make it
compatible with newer versions and they need to make money as a company as
well. They already contribute so much to the Cassandra community in
general and they are certainly not trying to stop people from continuing
to develop Brisk. Hadoop jobs that input and output to Cassandra will also
work without it as well. If you need the features of CFS and don’t want to
maintain HDFS then yes you'll have to pay for DSE.

If you are having issues with data not on the particular node that you are
reading from with Hadoop I'd go ahead and set the consistency level in
your job configuration as I recommended previously. Note there is also a
cassandra.consistencylevel.write setting if you are using either the
ColumnFamilyOutputFormat or BulkOutputFormat classes.

In terms of performance I have a MR job that reads in 30 million rows with
QUORUM consistency on 1.1.6 with RandomPartitioner and the mapper takes
about 11 minutes across 3 Hadoop nodes (our Cassandra cluster is obviously
larger but we haven't fully scaled out our Hadoop cluster yet). Hardware
is 2 7200 rpm drives + SSD for the commit log, 32GB of RAM, and 12 cores
per node. Hope this helps.

Best,
michael

On 10/18/12 12:24 PM, "Jean-Nicolas Boulay Desjardins"
<jnbdzjn...@gmail.com> wrote:

>I am surprise that it was abandoned this way. So if I want to use
>Brisk on Cassandra 1.1 I have to use DataStax Entreprise service...
>
>On Thu, Oct 18, 2012 at 3:00 PM, Michael Kjellman
><mkjell...@barracuda.com> wrote:
>>
>> Unless you have Brisk (however as far as I know there was one fork that
>> got it working on 1.0 but nothing for 1.1 and is not being actively
>> maintained by Datastax) or go with CFS (which comes with DSE) you are
>>not
>> guaranteed all data is on that hadoop node. You can take a look at the
>>forks
>> if interested here: https://github.com/riptano/brisk/network but I'd
>> personally be afraid to put my eggs in a basket that is certainly not
>>super
>> supported anymore.
>>
>> job.getConfiguration().set("cassandra.consistencylevel.read", "QUORUM");
>> should get you started.
>>
>>
>> Best,
>>
>> michael
>>
>>
>>
>> From: Jean-Nicolas Boulay Desjardins <jnbdzjn...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Thursday, October 18, 2012 11:49 AM
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: hadoop consistency level
>>
>> Why don't you look into Brisk:
>> http://www.datastax.com/docs/0.8/brisk/about_brisk
>>
>> On Thu, Oct 18, 2012 at 2:46 PM, Andrey Ilinykh <ailin...@gmail.com>
>> wrote:
>>>
>>> Hello, everybody!
>>> I'm thinking about running hadoop jobs on the top of the cassandra
>>> cluster. My understanding is - hadoop jobs read data from local nodes
>>> only. Does it mean the consistency level is always ONE?
>>>
>>> Thank you,
>>>   Andrey
>>
>>
>>
>> ----------------------------------
>> 'Like' us on Facebook for exclusive content and other resources on all
>> Barracuda Networks solutions.
>> Visit http://barracudanetworks.com/facebook
>>   

'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Re: hadoop consistency level

Reply via email to