Hi Matthieu, This really seems to help. I am using now a base62 encoded monotonically increasing integer, which means my doc_id goes from "0" onwards, using the alphabet:
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz I am getting now 3000 docs/s, more or less stable, and the size of my documents has decreased from 3KB to 0.4 KB. I am not sure whether this metrics will worsen when the database grows, but my feeling is that the situation has improved a lot just by changing the doc_id. I have one more question. Is the alphabet I have shown above "ordered" for couchdb? Thanks, Daniel On Thu, Mar 15, 2012 at 3:09 PM, Matthieu Rakotojaona < [email protected]> wrote: > On Thu, Mar 15, 2012 at 3:00 PM, Daniel Gonzalez <[email protected]> > wrote: > > I understand the overheads that you are referring to, but it still > schocks > > me that Couchdb needs 8 times as much space to store the data. > > > > Are there any guidelines on what to do/avoid in order to get a lower > > overhead ratio? > > I got surprisingly good results when changing the _id design. I advise > you to follow what is written in this page : > http://wiki.apache.org/couchdb/Performance#File_size > > Basically : > - use shorter _ids > - use sequential _ids. If you cannot (eg because you have multiple > disconnected parts that will have to merge often and that would cause > too many clashes), you can use couchdb's own semi-sequential generated > uuids. Yes, uuids are contradictory to the first point. > > > -- > Matthieu RAKOTOJAONA >
