On 20.12.2010, at 23:24, Paul Davis wrote:

> On Mon, Dec 20, 2010 at 5:20 PM, Sebastian Cohnen
> <[email protected]> wrote:
>> question inside :)
>> 
>> On 20.12.2010, at 23:02, Jan Lehnardt wrote:
>> 
>>> Hi,
>>> 
>>> On 20 Dec 2010, at 22:32, Chenini, Mohamed wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I found this info on the net at 
>>>> http://www.slideshare.net/danglbl/schemaless-databases
>>>> [...]
>>>> Does anyone knows if this was verified?
>>> 
>>> I think the author's comment on slide 35 sums it up pretty nicely:
>>> 
>>> "Of course this is just one (lame) test."
>>> 
>>> Coming up good numbers is hard which means that people with easy ways to 
>>> make them come up with bad ones.
>>> 
>>> I've written about the difficulties on benchmarks databases on my blog:
>>> 
>>>  
>>> http://jan.prima.de/~jan/plok/archives/175-Benchmarks-You-are-Doing-it-Wrong.html
>>>  
>>> http://jan.prima.de/~jan/plok/archives/176-Caveats-of-Evaluating-Databases.html
>>> 
>>> They should give you a few pointers on why this is hard.
>>> 
>>> --
>>> 
>>> To the point: CouchDB generally performs best with concurrent load. In the 
>>> case of loading data into CouchDB, bulk requests* will speed up things 
>>> again. To push CouchDB to a write limit, you want to use concurrent bulk 
>>> requests (specific numbers will depend on your data and hardware).
>> 
>> Does this really speed up things? I've tried this approach (concurrent bulk 
>> inserts) with small/big docs and small/big bulk chunk sizes: the difference 
>> was not significant. I thought this was reasonable, since writes are 
>> serialized anyways. The setup was one box generating documents, creating 
>> bulks and keep them in memory and bulk insert batches of complete docs 
>> (incl. simple monotonic increasing ints as doc ids) to another node. delayed 
>> commit was off.
>> 
> 
> I think delayed commit would need to be on there otherwise you'll be
> hitting fsync barriers for every bulk docs call which are serialized
> by the updater. Theoretically the speedups would come from letting the
> kernel manage the file buffers and what not.

delayed_commit was off because I needed to test insertion of lots of data (more 
than what would fit nicely into memory). I wanted to figure out, if normal bulk 
vs concurrent bulks does have an impact on insert performance. the difference 
was, as I said, not significant better or worse. btw: I didn't saturated the 
disks (mid-classed SSDs), since couch was eating up the CPU (3GHz Core 2 Duo). 
This was some time ago, maybe this is more disk bound now.

> 
>>> 
>>> * http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
>>> 
>>> Unfortunately this means that these one-off benchmarks don't show any good 
>>> numbers for CouchDB, yet fortunately this shows easily that these one-off 
>>> benchmarks don't really reflect common real-world usage and should be 
>>> discouraged.
>>> 
>>> Hope that helps, let us know if you have any more questions :)
>>> 
>>> Cheers
>>> Jan
>>> --
>>> 
>> 
>> 

Reply via email to