Re: Some stats about couch DB

J Chris Anderson Fri, 23 Jul 2010 18:42:39 -0700

On Jul 23, 2010, at 5:01 PM, Talib Sharif wrote:

> Hi All,
> 
> As I am playing more and more with couchdb (it is relaxing and fun), i just 
> am trying to understand the limits and the expectations in large production 
> system environment.
> 
> Right now i have about 100K documents and i have about 10 different views, 
> one of the view generates does about 100 emits per document.
> 
> As i am building the view indexes, it is taking about 7-8 hours of time.
>


this is about right for 10 million rows. That works out to about 350 rows per 
second (maybe more depending on what your other view are doing), which is a bit 
slower than I'm used to seeing, but it depends on the size of your emitted keys 
and values. If you can shrink the keys or the values you should see some 
speedup (marginal, not an order of magnitude).

because view generation is incremental, in production the 7-8 hours isn't the 
big issue, it's whether view generation can keep up with the insert rate. So if 
you are generating less than a few documents per second (x 100 emitted rows) 
then you should be able to keep the indexes current. If the indexes start to 
fall behind I'd suggest either upgrading hardware or moving to a clustered 
solution like CouchDB-Lounge.

for purposes of prototyping you will probably be happier working on a subset of 
the documents.


> I would like to know is that how are other people using it?
> Is 7-8 or even 24 hours of checkpointing view generation typical?
> How many documents do people have??
> How is other people's experience in genereting a view on lets say 1 MIllion 
> documents.
> 
> I have switched to the native _sum function for reduce. What else is taking 
> long? Is it the map function written in JavaScript? Is it the Index that's 
> getting too big?
> 


using an Erlang view function could potentially speed things up (but my guess 
is you are more likely disk-io bound, not CPU bound, so maybe it won't make 
much difference.)


> Is the view generation linear or does it gets worse when you have more 
> documents?
> 


the btree should get slower at roughly O(log n) where n is the number of rows. 
The base of the log is pretty big, too. Once you get up to the billion-rows 
territory you'll probably want to look more closely at CouchDB Lounge or the 
Cloudant clustering.

> I would extremely appreciate help in answering or discussing these questions.
> 
> Thanks in advance,
> Talib
>

Re: Some stats about couch DB

Reply via email to