Re: View generation checkpointing on every update

Chris Anderson Sun, 24 Apr 2011 20:23:32 -0700

I was just googling and came on this thread.

I agree a less aggressive checkpoint pattern could be beneficial. I
think there has been some discussion about this on the dev list. Now
I've gotta dig it up.


Was thinking about looking at the way couch_view_updater interacts
with couch_work_queue.

Chris

On Mon, Aug 23, 2010 at 12:04 AM, Sebastian Cohnen
<[email protected]> wrote:
> Hey Jamie,
>
> first, I don't know anything about view checkpointing and how/if it could be 
> customized in order to make couch commit less often, sorry :)
>
> (more replies inline)
>
>
> On 23.08.2010, at 08:38, Jamie Talbot wrote:
>
>> Tuyen Tran <ituyen@...> writes:
>>>
>>> We have a view that is checkpointing on every update and taking a long time
>>> to generate.
>>> <snip..>
>>> Has anyone seen similar performance? Are my documents too big with too many
>> fields?
>>>
>>> Thanks,
>>> -T
>>
>> I have almost the same situation as you, using CouchDB 1.0.  Only 3000
>> documents in this sample database (of an overall document set of 450000).  
>> Each
>> document is about 200KB, and contains an array of JSON objects, that each 
>> have
>> 3 small properties.
>>
>> My view emits a large key of 6 parts (an array of timestamp components) and a
>> value array with 2 integers, In and Out.  Without a reduce step it takes 
>> 5m20s
>> to generate.  With a reduce step that does a sum of Ins and Outs, it takes 
>> more
>> than 30 minutes.  Each document takes about 7 seconds to process.  It
>> checkpoints after every document.
>
> Raw speed is hard to compare, though 7s per document only for emitting some 
> fields of each document and summing up some values sound quite slow. Since 
> you are emitting non scalar values you cannot use the build-in reduce 
> functions, which are *very* fast (implemented in erlang, running inside 
> couchdb, no serialization overhead). But using a custom written erlang view 
> could still be a very good option.
>
>> When looking at the size of the view, it comes out at about 900MB of data,
>> from a 30MB database.  After compacting, this drops to 90MB, or a factor of 
>> 10.
>> I found 0.10 significantly faster, though I don't have hard numbers, and 
>> didn't
>> try 0.11.
>
> What are you emitting as keys for your view? This kind of discrepancy in size 
> between compacted and not-compated view could be a sign, that you are 
> emitting very large or complex keys (at least this is my experience). Maybe 
> you can have a look in this direction to optimize.
>
>> On these numbers, Couch is unfortunately going to be unusable.  For the full
>> document set, it is likely to take 44 days to build the view, and will take
>> roughly 1.5TB, which will compact down to 150GB.  Once it's up and running, 
>> it
>> will probably be fine; we only add a document every 2 minutes, so a 7 second
>> build time and calling stale=true on the client will suffice.  However the
>> risk on the view file is too great to bear.  If it were to be corrupted 
>> (Couch
>> does an excellent job at avoiding this, but you need to plan for the worst), 
>> it
>> would take a month and half to rebuild.
>
> View corruption is very unlikely, but you can copy around view files like 
> databases, so you could easily copy the views from your backup/slave/... 
> system to the server that got corrupted. So that shouldn't be a real problem.
>
>> I have seen a number of posts where people have starting considering a
>> different view building algorithm that is oriented to performance.  I would
>> personally love to see a "risky=true" build option for the views, which
>> focussed more on performance and less on stability, on the understanding that
>> if we crashed while generating it, we'd have to start again.  For the initial
>> load, and rebuilds, that would be a price worth paying.  We're never going to
>> have less data!
>>
>> I'm also keen to hear peoples' experiences with this.
>>
>> Kind Regards,
>>
>> Jamie.
>>
>>
>>
>
>



-- 
Chris Anderson
http://jchrisa.net
http://couchbase.com

Re: View generation checkpointing on every update

Reply via email to