Re: Size of couchdb documents

Daniel Gonzalez Thu, 15 Mar 2012 08:15:44 -0700

Hi Matthieu,

This really seems to help. I am using now a base62 encoded monotonically
increasing integer, which means my doc_id goes from "0" onwards, using the
alphabet:

ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz

I am getting now 3000 docs/s, more or less stable, and the size of my
documents has decreased from 3KB to 0.4 KB.
I am not sure whether this metrics will worsen when the database grows, but
my feeling is that the situation has improved a lot just by changing the
doc_id.

I have one more question. Is the alphabet I have shown above "ordered" for
couchdb?

Thanks,
Daniel

On Thu, Mar 15, 2012 at 3:09 PM, Matthieu Rakotojaona <
[email protected]> wrote:

> On Thu, Mar 15, 2012 at 3:00 PM, Daniel Gonzalez <[email protected]>
> wrote:
> > I understand the overheads that you are referring to, but it still
> schocks
> > me that Couchdb needs 8 times as much space to store the data.
> >
> > Are there any guidelines on what to do/avoid in order to get a lower
> > overhead ratio?
>
> I got surprisingly good results when changing the _id design. I advise
> you to follow what is written in this page :
> http://wiki.apache.org/couchdb/Performance#File_size
>
> Basically :
> - use shorter _ids
> - use sequential _ids. If you cannot (eg because you have multiple
> disconnected parts that will have to merge often and that would cause
> too many clashes), you can use couchdb's own semi-sequential generated
> uuids. Yes, uuids are contradictory to the first point.
>
>
> --
> Matthieu RAKOTOJAONA
>

Re: Size of couchdb documents

Reply via email to