Re: Advices for cluster optimal configuration

Alexander Sicular Tue, 08 Mar 2011 14:58:07 -0800

Return body = false has no impact on the consistency of your data. It will just 
lighten your network traffic.



On Mar 8, 2011, at 2:10 PM, Thibault Dory wrote:

> Thank you for your input Jeremiah,
> 
> I would like to keep the strong consistency when I'm writing data, so I will 
> keep the current setting. 
> I'm not benchmarking the bulk load part but random read/update and MapReduce 
> performances, can I turn off the returnbody and still keep my strong 
> consistency and still see the errors?
> 
> As I'm benchmarking MapReduce performances I cannot set the number of reduce 
> VM to zero. 
> 
> 2011/3/8 Jeremiah Peschka <[email protected]>
> There are a few things that you can do to speed up your load. When you're 
> writing your data, you can set both W and DW to 0 (as long as you have a way 
> to check for errors). This will shave a bit of time off of each write because 
> you'll be throwing writes against the database and hoping that they stick. 
> You can also set the returnbody to false. Returnbody defaults to true IIRC. 
> When returnbody enabled, Riak will return the object you wrote and also 
> include the Riak specific info (vclock, etc). I don't care about these things 
> when I'm doing a bulk load, so I turn that sort of thing off.
> 
> Depending on the type of querying you're doing, you can adjust the JavaScript 
> VM settings. For example, if you aren't doing any reduce phases in your 
> queries, then you can set the number of reduce VMs to 0. Since you're 
> probably only doing key lookups, you can probably kill off all of the 
> JavaScript VMs.
> 
> I suspect somebody smarter will have better input and will correct me, but 
> that's my 2 cents worth.
> 
> -- 
> Jeremiah Peschka
> Microsoft SQL Server MVP
> MCITP: Database Developer, DBA
> On Tuesday, March 8, 2011 at 8:34 AM, Thibault Dory wrote:
> 
>> Hello,
>> 
>> I'm benchmarking various noSQL databases (see www.nosqlbenchmarking.com for 
>> current results and configurations used) for my master's thesis and I'm 
>> going to apply this benchmark on bigger clusters. Indeed for the moment I 
>> have only used a small cluster of 8 servers with a very small data set 
>> (20000 articles from Wikipedia) to conduct those tests. 
>> 
>> I will use up to 100 servers (2Gb, 4 CPU, 80Gb hdd) from the Rackspace cloud 
>> and the new data set is the entire English version of Wikipedia. Each 
>> article is store as a single document with a unique ID based on a integer, 
>> you can see the implementation here : 
>> https://github.com/toflames/Wikipedia-noSQL-Benchmark/blob/master/src/implementations/riakDB.java
>>  and the benchmark methodology here : 
>> http://www.slideshare.net/ThibaultDory/a-new-methodology-for-large
>> 
>> I would like to know if some of you have advice on how I could take the best 
>> out of Riak for this specific use case and on this kind of server. For 
>> example I would like to know if there are some memory/cache tunings that 
>> could be useful to match this server size. 
>> 
>> Any other input or critic is welcome,
>> 
>> Thank you,
>> 
>> 
>> Thibault Dory 
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Advices for cluster optimal configuration

Reply via email to