Thanks Chris,
This is extremely helpful.
-Talib
On Jul 23, 2010, at 6:42 PM, J Chris Anderson wrote:
On Jul 23, 2010, at 5:01 PM, Talib Sharif wrote:
Hi All,
As I am playing more and more with couchdb (it is relaxing and
fun), i just am trying to understand the limits and the
expectations in large production system environment.
Right now i have about 100K documents and i have about 10 different
views, one of the view generates does about 100 emits per document.
As i am building the view indexes, it is taking about 7-8 hours of
time.
this is about right for 10 million rows. That works out to about 350
rows per second (maybe more depending on what your other view are
doing), which is a bit slower than I'm used to seeing, but it
depends on the size of your emitted keys and values. If you can
shrink the keys or the values you should see some speedup (marginal,
not an order of magnitude).
because view generation is incremental, in production the 7-8 hours
isn't the big issue, it's whether view generation can keep up with
the insert rate. So if you are generating less than a few documents
per second (x 100 emitted rows) then you should be able to keep the
indexes current. If the indexes start to fall behind I'd suggest
either upgrading hardware or moving to a clustered solution like
CouchDB-Lounge.
for purposes of prototyping you will probably be happier working on
a subset of the documents.
I would like to know is that how are other people using it?
Is 7-8 or even 24 hours of checkpointing view generation typical?
How many documents do people have??
How is other people's experience in genereting a view on lets say 1
MIllion documents.
I have switched to the native _sum function for reduce. What else
is taking long? Is it the map function written in JavaScript? Is it
the Index that's getting too big?
using an Erlang view function could potentially speed things up (but
my guess is you are more likely disk-io bound, not CPU bound, so
maybe it won't make much difference.)
Is the view generation linear or does it gets worse when you have
more documents?
the btree should get slower at roughly O(log n) where n is the
number of rows. The base of the log is pretty big, too. Once you get
up to the billion-rows territory you'll probably want to look more
closely at CouchDB Lounge or the Cloudant clustering.
I would extremely appreciate help in answering or discussing these
questions.
Thanks in advance,
Talib