On Jul 23, 2010, at 5:01 PM, Talib Sharif wrote: > Hi All, > > As I am playing more and more with couchdb (it is relaxing and fun), i just > am trying to understand the limits and the expectations in large production > system environment. > > Right now i have about 100K documents and i have about 10 different views, > one of the view generates does about 100 emits per document. > > As i am building the view indexes, it is taking about 7-8 hours of time. >
this is about right for 10 million rows. That works out to about 350 rows per second (maybe more depending on what your other view are doing), which is a bit slower than I'm used to seeing, but it depends on the size of your emitted keys and values. If you can shrink the keys or the values you should see some speedup (marginal, not an order of magnitude). because view generation is incremental, in production the 7-8 hours isn't the big issue, it's whether view generation can keep up with the insert rate. So if you are generating less than a few documents per second (x 100 emitted rows) then you should be able to keep the indexes current. If the indexes start to fall behind I'd suggest either upgrading hardware or moving to a clustered solution like CouchDB-Lounge. for purposes of prototyping you will probably be happier working on a subset of the documents. > I would like to know is that how are other people using it? > Is 7-8 or even 24 hours of checkpointing view generation typical? > How many documents do people have?? > How is other people's experience in genereting a view on lets say 1 MIllion > documents. > > I have switched to the native _sum function for reduce. What else is taking > long? Is it the map function written in JavaScript? Is it the Index that's > getting too big? > using an Erlang view function could potentially speed things up (but my guess is you are more likely disk-io bound, not CPU bound, so maybe it won't make much difference.) > Is the view generation linear or does it gets worse when you have more > documents? > the btree should get slower at roughly O(log n) where n is the number of rows. The base of the log is pretty big, too. Once you get up to the billion-rows territory you'll probably want to look more closely at CouchDB Lounge or the Cloudant clustering. > I would extremely appreciate help in answering or discussing these questions. > > Thanks in advance, > Talib >
