I was just googling and came on this thread. I agree a less aggressive checkpoint pattern could be beneficial. I think there has been some discussion about this on the dev list. Now I've gotta dig it up.
Was thinking about looking at the way couch_view_updater interacts with couch_work_queue. Chris On Mon, Aug 23, 2010 at 12:04 AM, Sebastian Cohnen <[email protected]> wrote: > Hey Jamie, > > first, I don't know anything about view checkpointing and how/if it could be > customized in order to make couch commit less often, sorry :) > > (more replies inline) > > > On 23.08.2010, at 08:38, Jamie Talbot wrote: > >> Tuyen Tran <ituyen@...> writes: >>> >>> We have a view that is checkpointing on every update and taking a long time >>> to generate. >>> <snip..> >>> Has anyone seen similar performance? Are my documents too big with too many >> fields? >>> >>> Thanks, >>> -T >> >> I have almost the same situation as you, using CouchDB 1.0. Only 3000 >> documents in this sample database (of an overall document set of 450000). >> Each >> document is about 200KB, and contains an array of JSON objects, that each >> have >> 3 small properties. >> >> My view emits a large key of 6 parts (an array of timestamp components) and a >> value array with 2 integers, In and Out. Without a reduce step it takes >> 5m20s >> to generate. With a reduce step that does a sum of Ins and Outs, it takes >> more >> than 30 minutes. Each document takes about 7 seconds to process. It >> checkpoints after every document. > > Raw speed is hard to compare, though 7s per document only for emitting some > fields of each document and summing up some values sound quite slow. Since > you are emitting non scalar values you cannot use the build-in reduce > functions, which are *very* fast (implemented in erlang, running inside > couchdb, no serialization overhead). But using a custom written erlang view > could still be a very good option. > >> When looking at the size of the view, it comes out at about 900MB of data, >> from a 30MB database. After compacting, this drops to 90MB, or a factor of >> 10. >> I found 0.10 significantly faster, though I don't have hard numbers, and >> didn't >> try 0.11. > > What are you emitting as keys for your view? This kind of discrepancy in size > between compacted and not-compated view could be a sign, that you are > emitting very large or complex keys (at least this is my experience). Maybe > you can have a look in this direction to optimize. > >> On these numbers, Couch is unfortunately going to be unusable. For the full >> document set, it is likely to take 44 days to build the view, and will take >> roughly 1.5TB, which will compact down to 150GB. Once it's up and running, >> it >> will probably be fine; we only add a document every 2 minutes, so a 7 second >> build time and calling stale=true on the client will suffice. However the >> risk on the view file is too great to bear. If it were to be corrupted >> (Couch >> does an excellent job at avoiding this, but you need to plan for the worst), >> it >> would take a month and half to rebuild. > > View corruption is very unlikely, but you can copy around view files like > databases, so you could easily copy the views from your backup/slave/... > system to the server that got corrupted. So that shouldn't be a real problem. > >> I have seen a number of posts where people have starting considering a >> different view building algorithm that is oriented to performance. I would >> personally love to see a "risky=true" build option for the views, which >> focussed more on performance and less on stability, on the understanding that >> if we crashed while generating it, we'd have to start again. For the initial >> load, and rebuilds, that would be a price worth paying. We're never going to >> have less data! >> >> I'm also keen to hear peoples' experiences with this. >> >> Kind Regards, >> >> Jamie. >> >> >> > > -- Chris Anderson http://jchrisa.net http://couchbase.com
