Re: general question about couch performance

Mark Hahn Thu, 17 Jan 2013 15:45:44 -0800

thx



On Thu, Jan 17, 2013 at 3:29 PM, Daniel Gonzalez <[email protected]>wrote:

> The problem is not replication, the problem is the source of the data. The
> replicators will just distribute the data that is being inserted to other
> server instances.
>
> You can not use that monotonical id generator if you are inserting data
> from different servers or applications. But if you are, let's say,
> importing data to a single couchdb (replication or not) from a third-party
> database in one batch job, you have full control on the IDs, so you can use
> that id generator. That will improve the performance of your database,
> specially in relation to space used and view generation.
>
> On Fri, Jan 18, 2013 at 12:20 AM, Mark Hahn <[email protected]> wrote:
>
> > > you can only do this if you are in control of the IDs
> >
> > This wouldn't work with multiple servers replicating, would it?
> >
> >
> > On Thu, Jan 17, 2013 at 3:15 PM, Daniel Gonzalez <[email protected]
> > >wrote:
> >
> > > And here you have BaseConverter:
> > >
> > > """
> > > Convert numbers from base 10 integers to base X strings and back again.
> > >
> > > Sample usage:
> > >
> > > >>> base20 = BaseConverter('0123456789abcdefghij')
> > > >>> base20.from_decimal(1234)
> > > '31e'
> > > >>> base20.to_decimal('31e')
> > > 1234
> > > """
> > >
> > > class BaseConverter(object):
> > >     decimal_digits = "0123456789"
> > >
> > >     def __init__(self, digits):
> > >         self.digits = digits
> > >
> > >     def from_decimal(self, i):
> > >         return self.convert(i, self.decimal_digits, self.digits)
> > >
> > >     def to_decimal(self, s):
> > >         return int(self.convert(s, self.digits, self.decimal_digits))
> > >
> > >     def convert(number, fromdigits, todigits):
> > >         # Based on http://code.activestate.com/recipes/111286/
> > >         if str(number)[0] == '-':
> > >             number = str(number)[1:]
> > >             neg = 1
> > >         else:
> > >             neg = 0
> > >
> > >         # make an integer out of the number
> > >         x = 0
> > >         for digit in str(number):
> > >            x = x * len(fromdigits) + fromdigits.index(digit)
> > >
> > >         # create the result in base 'len(todigits)'
> > >         if x == 0:
> > >             res = todigits[0]
> > >         else:
> > >             res = ""
> > >             while x > 0:
> > >                 digit = x % len(todigits)
> > >                 res = todigits[digit] + res
> > >                 x = int(x / len(todigits))
> > >             if neg:
> > >                 res = '-' + res
> > >         return res
> > >     convert = staticmethod(convert)
> > >
> > >
> > > On Fri, Jan 18, 2013 at 12:13 AM, Daniel Gonzalez <
> [email protected]
> > > >wrote:
> > >
> > > > Also, in order to improve view performance, it is better if you use a
> > > > short and monotonically increasing id: this is what I am using for
> one
> > of
> > > > my databases with millions of documents:
> > > >
> > > > class MonotonicalID:
> > > >
> > > >     def __init__(self, cnt = 0):
> > > >         self.cnt = cnt
> > > >         self.base62 =
> > > >
> > >
> >
> BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz')
> > > >         # This alphabet is better for couchdb, since it represents
> the
> > > > Unicode Collation Algorithm
> > > >         self.base64_couch =
> > > >
> > >
> >
> BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')
> > > >
> > > >     def get(self):
> > > >         res = self.base64_couch.from_decimal(self.cnt)
> > > >         self.cnt += 1
> > > >         return res
> > > >
> > > > Doing this will:
> > > > - save space in the database, since the id starts small: take into
> > > account
> > > > that the id is used in lots of internal data structures in couchdb,
> so
> > > > making it short will save lots of space in a big database
> > > > - making it ordered (in the couchdb sense) will speed up certain
> > > operations
> > > >
> > > > Drawback: you can only do this if you are in control of the IDs (you
> > know
> > > > that nobody else is going to be generating IDs)
> > > >
> > > > On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn <[email protected]> wrote:
> > > >
> > > >> Thanks for the tips.  Keep them coming.
> > > >>
> > > >> I'm going to try everything I can.  If I find anything surprising
> I'll
> > > let
> > > >> everyone know.
> > > >>
> > > >>
> > > >> On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez <
> > [email protected]
> > > >> >wrote:
> > > >>
> > > >> > Are you doing single writes or batch writes?
> > > >> > I managed to improve the write performance by collecting the
> > documents
> > > >> and
> > > >> > sending them in a single access.
> > > >> > The same applies for read accesses.
> > > >> >
> > > >> > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <[email protected]>
> wrote:
> > > >> >
> > > >> > > My couchdb is seeing a typical request rate of about 100/sec
> when
> > it
> > > >> is
> > > >> > > maxed out.  This is typically 10 reads/write.  This is
> > > disappointing.
> > > >>  I
> > > >> > > was hoping to 3 to 5 ms per op, not 10 ms.  What performance
> > numbers
> > > >> are
> > > >> > > others seeing?
> > > >> > >
> > > >> > > I have 35 views with only 50 to 100 entries per view.  My db is
> > less
> > > >> > than a
> > > >> > > gigabyte with a few thousand active docs.
> > > >> > >
> > > >> > > I'm running on a medium ec2 instance with ephemeral disk.  I
> > assume
> > > I
> > > >> am
> > > >> > IO
> > > >> > > bound as the cpu is not maxing out.
> > > >> > >
> > > >> > > How much worse would this get if the db also had to handle
> > > replication
> > > >> > > between multiple servers?
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: general question about couch performance

Reply via email to