Tuyen Tran <itu...@...> writes: > > We have a view that is checkpointing on every update and taking a long time > to generate. > <snip..> > Has anyone seen similar performance? Are my documents too big with too many fields? > > Thanks, > -T
I have almost the same situation as you, using CouchDB 1.0. Only 3000 documents in this sample database (of an overall document set of 450000). Each document is about 200KB, and contains an array of JSON objects, that each have 3 small properties. My view emits a large key of 6 parts (an array of timestamp components) and a value array with 2 integers, In and Out. Without a reduce step it takes 5m20s to generate. With a reduce step that does a sum of Ins and Outs, it takes more than 30 minutes. Each document takes about 7 seconds to process. It checkpoints after every document. When looking at the size of the view, it comes out at about 900MB of data, from a 30MB database. After compacting, this drops to 90MB, or a factor of 10. I found 0.10 significantly faster, though I don't have hard numbers, and didn't try 0.11. On these numbers, Couch is unfortunately going to be unusable. For the full document set, it is likely to take 44 days to build the view, and will take roughly 1.5TB, which will compact down to 150GB. Once it's up and running, it will probably be fine; we only add a document every 2 minutes, so a 7 second build time and calling stale=true on the client will suffice. However the risk on the view file is too great to bear. If it were to be corrupted (Couch does an excellent job at avoiding this, but you need to plan for the worst), it would take a month and half to rebuild. I have seen a number of posts where people have starting considering a different view building algorithm that is oriented to performance. I would personally love to see a "risky=true" build option for the views, which focussed more on performance and less on stability, on the understanding that if we crashed while generating it, we'd have to start again. For the initial load, and rebuilds, that would be a price worth paying. We're never going to have less data! I'm also keen to hear peoples' experiences with this. Kind Regards, Jamie.
