On Thu, Mar 15, 2012 at 10:14 PM, Daniel Gonzalez <[email protected]> wrote:
> Hi Matthieu,
>
> This really seems to help. I am using now a base62 encoded monotonically
> increasing integer, which means my doc_id goes from "0" onwards, using the
> alphabet:
>
> ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz
>
> I am getting now 3000 docs/s, more or less stable, and the size of my
> documents has decreased from 3KB to 0.4 KB.
> I am not sure whether this metrics will worsen when the database grows, but
> my feeling is that the situation has improved a lot just by changing the
> doc_id.

Hi, Daniel. That's great news! Also, I have an update from a CouchDB 1.2.0 test.

I have a database here with 10 million documents, most several KB of
English text. upgrade to version 1.2 changed the database size from
38GB to is 9.2GB, or now 0.94 KB per document.

So you should see an even greater improvement when 1.2.0 comes out
Real Soon Now.

> I have one more question. Is the alphabet I have shown above "ordered" for
> couchdb?

The sort order may not be quite what you expect, especially if you
work with Unix or servers a lot.

It is described here:
http://wiki.apache.org/couchdb/View_collation#Collation_Specification

Basically CouchDB follows (uses!) ICU. The major point is that
different letter sequences are compared case-insensitively, but
same-letter strings are case sensitive (lower case first). To me, it
more or less follows how an English dictionary would do it.

-- 
Iris Couch

Reply via email to