Also, in order to improve view performance, it is better if you use a short
and monotonically increasing id: this is what I am using for one of my
databases with millions of documents:
class MonotonicalID:
def __init__(self, cnt = 0):
self.cnt = cnt
self.base62 =
BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz')
# This alphabet is better for couchdb, since it represents the
Unicode Collation Algorithm
self.base64_couch =
BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')
def get(self):
res = self.base64_couch.from_decimal(self.cnt)
self.cnt += 1
return res
Doing this will:
- save space in the database, since the id starts small: take into account
that the id is used in lots of internal data structures in couchdb, so
making it short will save lots of space in a big database
- making it ordered (in the couchdb sense) will speed up certain operations
Drawback: you can only do this if you are in control of the IDs (you know
that nobody else is going to be generating IDs)
On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn <[email protected]> wrote:
> Thanks for the tips. Keep them coming.
>
> I'm going to try everything I can. If I find anything surprising I'll let
> everyone know.
>
>
> On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez <[email protected]
> >wrote:
>
> > Are you doing single writes or batch writes?
> > I managed to improve the write performance by collecting the documents
> and
> > sending them in a single access.
> > The same applies for read accesses.
> >
> > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <[email protected]> wrote:
> >
> > > My couchdb is seeing a typical request rate of about 100/sec when it is
> > > maxed out. This is typically 10 reads/write. This is disappointing.
> I
> > > was hoping to 3 to 5 ms per op, not 10 ms. What performance numbers
> are
> > > others seeing?
> > >
> > > I have 35 views with only 50 to 100 entries per view. My db is less
> > than a
> > > gigabyte with a few thousand active docs.
> > >
> > > I'm running on a medium ec2 instance with ephemeral disk. I assume I
> am
> > IO
> > > bound as the cpu is not maxing out.
> > >
> > > How much worse would this get if the db also had to handle replication
> > > between multiple servers?
> > >
> >
>