On Mon, Dec 20, 2010 at 5:20 PM, Sebastian Cohnen <[email protected]> wrote: > question inside :) > > On 20.12.2010, at 23:02, Jan Lehnardt wrote: > >> Hi, >> >> On 20 Dec 2010, at 22:32, Chenini, Mohamed wrote: >> >>> Hi, >>> >>> I found this info on the net at >>> http://www.slideshare.net/danglbl/schemaless-databases >>> [...] >>> Does anyone knows if this was verified? >> >> I think the author's comment on slide 35 sums it up pretty nicely: >> >> "Of course this is just one (lame) test." >> >> Coming up good numbers is hard which means that people with easy ways to >> make them come up with bad ones. >> >> I've written about the difficulties on benchmarks databases on my blog: >> >> http://jan.prima.de/~jan/plok/archives/175-Benchmarks-You-are-Doing-it-Wrong.html >> http://jan.prima.de/~jan/plok/archives/176-Caveats-of-Evaluating-Databases.html >> >> They should give you a few pointers on why this is hard. >> >> -- >> >> To the point: CouchDB generally performs best with concurrent load. In the >> case of loading data into CouchDB, bulk requests* will speed up things >> again. To push CouchDB to a write limit, you want to use concurrent bulk >> requests (specific numbers will depend on your data and hardware). > > Does this really speed up things? I've tried this approach (concurrent bulk > inserts) with small/big docs and small/big bulk chunk sizes: the difference > was not significant. I thought this was reasonable, since writes are > serialized anyways. The setup was one box generating documents, creating > bulks and keep them in memory and bulk insert batches of complete docs (incl. > simple monotonic increasing ints as doc ids) to another node. delayed commit > was off. >
I think delayed commit would need to be on there otherwise you'll be hitting fsync barriers for every bulk docs call which are serialized by the updater. Theoretically the speedups would come from letting the kernel manage the file buffers and what not. >> >> * http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API >> >> Unfortunately this means that these one-off benchmarks don't show any good >> numbers for CouchDB, yet fortunately this shows easily that these one-off >> benchmarks don't really reflect common real-world usage and should be >> discouraged. >> >> Hope that helps, let us know if you have any more questions :) >> >> Cheers >> Jan >> -- >> > >
