thx
On Thu, Jan 17, 2013 at 3:29 PM, Daniel Gonzalez <[email protected]>wrote: > The problem is not replication, the problem is the source of the data. The > replicators will just distribute the data that is being inserted to other > server instances. > > You can not use that monotonical id generator if you are inserting data > from different servers or applications. But if you are, let's say, > importing data to a single couchdb (replication or not) from a third-party > database in one batch job, you have full control on the IDs, so you can use > that id generator. That will improve the performance of your database, > specially in relation to space used and view generation. > > On Fri, Jan 18, 2013 at 12:20 AM, Mark Hahn <[email protected]> wrote: > > > > you can only do this if you are in control of the IDs > > > > This wouldn't work with multiple servers replicating, would it? > > > > > > On Thu, Jan 17, 2013 at 3:15 PM, Daniel Gonzalez <[email protected] > > >wrote: > > > > > And here you have BaseConverter: > > > > > > """ > > > Convert numbers from base 10 integers to base X strings and back again. > > > > > > Sample usage: > > > > > > >>> base20 = BaseConverter('0123456789abcdefghij') > > > >>> base20.from_decimal(1234) > > > '31e' > > > >>> base20.to_decimal('31e') > > > 1234 > > > """ > > > > > > class BaseConverter(object): > > > decimal_digits = "0123456789" > > > > > > def __init__(self, digits): > > > self.digits = digits > > > > > > def from_decimal(self, i): > > > return self.convert(i, self.decimal_digits, self.digits) > > > > > > def to_decimal(self, s): > > > return int(self.convert(s, self.digits, self.decimal_digits)) > > > > > > def convert(number, fromdigits, todigits): > > > # Based on http://code.activestate.com/recipes/111286/ > > > if str(number)[0] == '-': > > > number = str(number)[1:] > > > neg = 1 > > > else: > > > neg = 0 > > > > > > # make an integer out of the number > > > x = 0 > > > for digit in str(number): > > > x = x * len(fromdigits) + fromdigits.index(digit) > > > > > > # create the result in base 'len(todigits)' > > > if x == 0: > > > res = todigits[0] > > > else: > > > res = "" > > > while x > 0: > > > digit = x % len(todigits) > > > res = todigits[digit] + res > > > x = int(x / len(todigits)) > > > if neg: > > > res = '-' + res > > > return res > > > convert = staticmethod(convert) > > > > > > > > > On Fri, Jan 18, 2013 at 12:13 AM, Daniel Gonzalez < > [email protected] > > > >wrote: > > > > > > > Also, in order to improve view performance, it is better if you use a > > > > short and monotonically increasing id: this is what I am using for > one > > of > > > > my databases with millions of documents: > > > > > > > > class MonotonicalID: > > > > > > > > def __init__(self, cnt = 0): > > > > self.cnt = cnt > > > > self.base62 = > > > > > > > > > > BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz') > > > > # This alphabet is better for couchdb, since it represents > the > > > > Unicode Collation Algorithm > > > > self.base64_couch = > > > > > > > > > > BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ') > > > > > > > > def get(self): > > > > res = self.base64_couch.from_decimal(self.cnt) > > > > self.cnt += 1 > > > > return res > > > > > > > > Doing this will: > > > > - save space in the database, since the id starts small: take into > > > account > > > > that the id is used in lots of internal data structures in couchdb, > so > > > > making it short will save lots of space in a big database > > > > - making it ordered (in the couchdb sense) will speed up certain > > > operations > > > > > > > > Drawback: you can only do this if you are in control of the IDs (you > > know > > > > that nobody else is going to be generating IDs) > > > > > > > > On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn <[email protected]> wrote: > > > > > > > >> Thanks for the tips. Keep them coming. > > > >> > > > >> I'm going to try everything I can. If I find anything surprising > I'll > > > let > > > >> everyone know. > > > >> > > > >> > > > >> On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez < > > [email protected] > > > >> >wrote: > > > >> > > > >> > Are you doing single writes or batch writes? > > > >> > I managed to improve the write performance by collecting the > > documents > > > >> and > > > >> > sending them in a single access. > > > >> > The same applies for read accesses. > > > >> > > > > >> > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <[email protected]> > wrote: > > > >> > > > > >> > > My couchdb is seeing a typical request rate of about 100/sec > when > > it > > > >> is > > > >> > > maxed out. This is typically 10 reads/write. This is > > > disappointing. > > > >> I > > > >> > > was hoping to 3 to 5 ms per op, not 10 ms. What performance > > numbers > > > >> are > > > >> > > others seeing? > > > >> > > > > > >> > > I have 35 views with only 50 to 100 entries per view. My db is > > less > > > >> > than a > > > >> > > gigabyte with a few thousand active docs. > > > >> > > > > > >> > > I'm running on a medium ec2 instance with ephemeral disk. I > > assume > > > I > > > >> am > > > >> > IO > > > >> > > bound as the cpu is not maxing out. > > > >> > > > > > >> > > How much worse would this get if the db also had to handle > > > replication > > > >> > > between multiple servers? > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > >
