Are you really extrapolating mb/s from a single curl command?
On 23 March 2012 12:15, Jonathan Williamson <[email protected]> wrote: > Jason, > > Apologies if I came off too demanding - I don't mean to be! I just > want to understand why so that we can make a sensible decision as to > how to move forward with or without CouchDB in the long term. > > My initial comparison was actually to reading a file off disk, but I > thought that unfair so added the overhead of a web server. That's not > to say I'd expect CouchDB to 1:1 match the performance of Nginx, but > currently it is 1 to 2 orders of magnitude slower for the task I > described. > > It's interesting to hear that CouchDB compares the document to a > checksum prior to serving it, do you have any idea what overhead this > adds? What's the reasoning behind it? (I mean data could be corrupted > in transmission, or in memory after checksumming, etc). > > The main reasons I would expect CouchDB to be fast at this specific > operation are: > > - Prebuilt indexes: I was surprised this didn't allow to CouchDB to > very quickly identify where to retrieve the document from within its > datafiles. > - Internal storage format: Seems to be almost raw JSON in the files > with a bit of metadata, should allow for (almost) direct streaming to > client? > > It's not that CouchDB is slower than Nginx per se, it's that that's > massively slower. For example having flushed file caches on my dev box > Nginx can serve a large static file at 83MB/s (which is 25 times > faster than CouchDB on the same hardware). > > On Fri, Mar 23, 2012 at 11:56 AM, Jason Smith <[email protected]> wrote: >> CouchDB verifies that the document contents match a checksum which >> does impose computation and codec overhead, yes. >> >> Considering that CouchDB stores multiple sorted indices to the >> documents in a database which is itself a filesystem file, in a safe >> append-only format, how would you justify an expectation of static >> Nginx performance? Surely CouchDB must open the file (right there you >> have tied Nginx at best) and then seek through its metadata to fetch >> the doc. Note, my disagreement with you is not fundamental, just of >> degree. Surely it is fair to give CouchDB some elbow room to work, to >> pay for its benefits? >> >> Back to document comprehension, CouchDB does do that and this is a >> huge opportunity for improvement. I believe Filipe has indeed proposed >> something much like you describe: store the utf-8 JSON directly on the >> disk. >> >> I'm excited that this conversation can paint a more clear picture of >> what we expect from CouchDB, to find a speed at which we could say, >> "this is slower than Brand X, but it's worth it." >> >> On Fri, Mar 23, 2012 at 11:41 AM, Jonathan Williamson <[email protected]> >> wrote: >>> As I'm requesting the documents in the exact format I submitted them >>> (with no transformations or extra information) I'd expect something >>> not far off a static file request from Nginx. As far as I can tell the >>> .couch files aren't compressed (though that wouldn't cause such slow >>> performance on an i5 anyway) and appear to contain the original >>> documents almost "as is". >>> >>> The other side effect is that while fetching the documents the CPU >>> usages rises to 100% which suggests, I guess, that CouchDB is reading, >>> deserialising, serialising, and then streaming the document. But it >>> doesn't seem like that should be necessary really?
