I'd be intrigued to know how we could store the raw JSON on disk given the make_blocks behavior but, yes, the reason couchdb isn't giving sendfile() like performance is the json<>erlang conversion at least, and other things like reading btree nodes to find the data (even if they come from cache), etc.
What I was driving at with 'use a benchmarking tool' was to eliminate artifacts like the time curl takes to start the connection, etc. All the tools I listed record the time of the actual request/response. With ab and nodeload I can get similar figures, though I can also crank up concurrency and get the same numbers for each request (but 10x the total throughput). Without those kinds of options (number of users, tcp keep-alive, http keep-alive) it's very hard to discuss and compare benchmarks. Curl just isn't enough. B. On 23 March 2012 13:23, Jonathan Williamson <[email protected]> wrote: > Volker, > > Thanks for the input, that all sounds likely! It's not a massive > problem for us to store a raw copy of our data alongside our Couch > databases. > > I do however think this falls slightly under "unexpected behaviour" as > intuitively I think a lot of people would expect raw read speeds to be > pretty fast and not require such heavy CPU usage. That said it's easy > to work around just a bit of a surprise to come across. > > Love CouchDB for all the things it does for us so well - it's a great product! > > Jon. > > On Fri, Mar 23, 2012 at 1:17 PM, Volker Mische <[email protected]> > wrote: >> I agree that using a trusted benchmarking tool is the way to go. Tough >> what Jonathan sees is pretty clear. It's the JSON -> Eterm -> String >> conversion that CouchDB is currently doing. Filipe proposed a patch that >> store the raw JSON on disk, to get rid most of this conversion. I don't >> remember exactly, but I'm pretty sure he provided sensible benchmarking >> results back then. >> >> Hence the point of this thread shouldn't be: go, do it properly. But: we >> have a clue why it is so slow, we don't store raw JSON, but Eterms on >> disk, that need to be assembled to a string everytime you request it. >> >> Cheers, >> Volker
