On Tue, Jan 27, 2009 at 1:22 PM, Brian Candler <[email protected]> wrote:
>> In a recent benchmark: Inserting 100b docs into an empty database:
>> ~6200docs/s. Inserting the same docs in a 90 000 000 doc db: 6000docs/
>> s (with sequential ids). Data scales.
>
> This is very interesting - one of the applications I'm thinking of has a
> profile just like this (warehouse for RADIUS accounting records)
>
> I have a couple of questions relating to this.
>
> - To get such high performance, is it necessary to use _bulk_docs, or was
>  it achieved with regular PUT operations?

This was done with a pure Erlang interface, and bulk size of 1k docs.
The HTTP interface may add minimal overhead if your json is not
complex and you use bulk_docs

>
> - Does Couchdb commit its data to stable storage *before* returning a HTTP
>  response? That is, once you receive a HTTP success response, you can be
>  sure that the data has already hit the disk?

There is a header you can send which forces a full fsync before it
returns. In the default case, it only returns after writing the file
(but trusts the OS -- which usually lies in the interest of speed.)

>
> If Couchdb can handle 6,000 individual PUT requests per second, *and* only
> respond after they are committed to stable storage, then I think it must be
> batching the writes and hence delaying the responses somewhat (by how much?
> I couldn't see a tunable parameter for this)

This involved 6 saves per second, also allowing the OS to manage the disk.

>
> However if this performance is only achievable using _bulk_docs, I'll have
> to write my RADIUS server / Couchdb client to perform its own local
> batching and POST these batches a few times per second.
>
> I presume that batching also affects disk space used (before compaction) - I
> wouldn't want each 200 byte RADIUS record taking up 4KB :-)
>
> Final question: does Couchdb perform any gzip-like compression when writing
> the JSON to disk? These 200 byte RADIUS records will become a lot larger
> when expanded into verbose JSON.

Couch uses Erlang's term_to_binary for saving, which I believe uses
gzip by default. This is worth verifying, it's been a while since I
toured that part of the source.

>
>  {
>    "framed_ip_address":"192.168.1.1",  // 6 bytes in original packet
>    ... etc
>  }
>
> Regards,
>
> Brian.
>



-- 
Chris Anderson
http://jchris.mfdz.com

Reply via email to