On 20.12.2010, at 23:24, Paul Davis wrote: > On Mon, Dec 20, 2010 at 5:20 PM, Sebastian Cohnen > <[email protected]> wrote: >> question inside :) >> >> On 20.12.2010, at 23:02, Jan Lehnardt wrote: >> >>> Hi, >>> >>> On 20 Dec 2010, at 22:32, Chenini, Mohamed wrote: >>> >>>> Hi, >>>> >>>> I found this info on the net at >>>> http://www.slideshare.net/danglbl/schemaless-databases >>>> [...] >>>> Does anyone knows if this was verified? >>> >>> I think the author's comment on slide 35 sums it up pretty nicely: >>> >>> "Of course this is just one (lame) test." >>> >>> Coming up good numbers is hard which means that people with easy ways to >>> make them come up with bad ones. >>> >>> I've written about the difficulties on benchmarks databases on my blog: >>> >>> >>> http://jan.prima.de/~jan/plok/archives/175-Benchmarks-You-are-Doing-it-Wrong.html >>> >>> http://jan.prima.de/~jan/plok/archives/176-Caveats-of-Evaluating-Databases.html >>> >>> They should give you a few pointers on why this is hard. >>> >>> -- >>> >>> To the point: CouchDB generally performs best with concurrent load. In the >>> case of loading data into CouchDB, bulk requests* will speed up things >>> again. To push CouchDB to a write limit, you want to use concurrent bulk >>> requests (specific numbers will depend on your data and hardware). >> >> Does this really speed up things? I've tried this approach (concurrent bulk >> inserts) with small/big docs and small/big bulk chunk sizes: the difference >> was not significant. I thought this was reasonable, since writes are >> serialized anyways. The setup was one box generating documents, creating >> bulks and keep them in memory and bulk insert batches of complete docs >> (incl. simple monotonic increasing ints as doc ids) to another node. delayed >> commit was off. >> > > I think delayed commit would need to be on there otherwise you'll be > hitting fsync barriers for every bulk docs call which are serialized > by the updater. Theoretically the speedups would come from letting the > kernel manage the file buffers and what not.
delayed_commit was off because I needed to test insertion of lots of data (more than what would fit nicely into memory). I wanted to figure out, if normal bulk vs concurrent bulks does have an impact on insert performance. the difference was, as I said, not significant better or worse. btw: I didn't saturated the disks (mid-classed SSDs), since couch was eating up the CPU (3GHz Core 2 Duo). This was some time ago, maybe this is more disk bound now. > >>> >>> * http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API >>> >>> Unfortunately this means that these one-off benchmarks don't show any good >>> numbers for CouchDB, yet fortunately this shows easily that these one-off >>> benchmarks don't really reflect common real-world usage and should be >>> discouraged. >>> >>> Hope that helps, let us know if you have any more questions :) >>> >>> Cheers >>> Jan >>> -- >>> >> >>
