Re: Some guidance with extremely slow indexing

Paul Davis Sun, 12 Apr 2009 10:19:39 -0700

On Sun, Apr 12, 2009 at 12:51 PM, Kenneth Kalmer
<[email protected]> wrote:
> On Sun, Apr 12, 2009 at 1:11 AM, Chris Anderson <[email protected]> wrote:
>
>> On Sat, Apr 11, 2009 at 12:06 PM, Paul Davis
>> <[email protected]> wrote:
>> > On Sat, Apr 11, 2009 at 2:58 PM, Kenneth Kalmer
>> > <[email protected]> wrote:
>> >> On Thu, Apr 9, 2009 at 5:17 PM, Paul Davis <[email protected]
>> >wrote:
>> >>
>> >>> Kenneth,
>> >>>
>> >>> I'm pretty sure you're issue is in the reduce steps for the daily and
>> >>> montly views. The general rule of thumb is that you shouldn't be
>> >>> returning data that grows faster than log(#keys processed) where as I
>> >>> believe your data is growing linearly with input.
>> >>>
>> >>> This particular limitation is a result of the implementation of
>> >>> incremental reductions. Basically, each key/pointer pair stores the
>> >>> re-reduced value for all [re-]reduce values in its children nodes. So
>> >>> as your reduction moves up the tree the data starts exploding which
>> >>> kills btree performance not to mention the extra file I/O.
>> >>>
>> >>> The basic moral of the story is that if you want reduce views like
>> >>> this per user you should emit a [user_id, date] pair as the key and
>> >>> then call your reduce views with group=true.
>> >>>
>> >>> HTH,
>> >>> Paul Davis
>> >>>
>> >>
>> >> Hi Paul
>> >>
>> >> Thanks for taking the trouble of investigating for me, I'll dive into
>> the
>> >> views and clean them up a bit according to your advice as well as brush
>> up
>> >> on the caveat you explained. I saw other threads in the archives where
>> you
>> >> gave similar advice, sorry for not realizing I stepped into the same
>> trap.
>> >> When I've got the issue resolved I'll update the gist and we can leave
>> it as
>> >> a point of reference for others.
>> >>
>> >> Thanks again!
>> >>
>> >
>> > Its kind of a hard one to notice right away as its not an error, it
>> > just kills performance. Perhaps Damien was right in that we should
>> > think about adding log vomiting when we detect that there's a crap
>> > load of data accumulating in the reductions.
>> >
>>
>> I agree -- maybe another config setting
>> max_intermediate_reduction_size or something. So that you can raise it
>> if you really know what you are doing. Unless there are hard-limits,
>> in which case we should just error properly when we reach them.
>>
>
> Hi Paul & Chris
>
> This would help, I'm sure a lot of people would be caught in this trap
> initially.
>
> I've cleaned up my views a bit and the are much more performant now. On our
> "production" couch where there is currently 6.6 million docs now the
> indexing has been running now for close to 18 hours and is 80% done. I
> killed the previous indexing task, since after 5 days it was only
> 50-something percent done with 3.1 million docs at the time it started.
>


Yeah, that sounds much closer to the expected performance.

> After going through the docs carefully again and clearly thinking through my
> problem, as well as taking the "emit([key, doc.user])" advice from Paul more
> seriously I got it working. The docs gives the warning, without any real
> references, making it sound like a "yeah whatever" kinda thing.

I've updated the wiki with hopefully a more stern warning about the
expected data characteristics of reduce functions.

 This is
> dangerous. However the realm gem lies in a line I picked up somewhere in the
> wiki, it stresses that the reduce views should build a summary, not
> aggregate data, which was my mistake. I now aggregate the data in my own app
> with two extra lines of code and the views now become very powerful using
> group_level. So my old 'days' and 'daily' views are now combined in a
> single, more useful, 'daily' view.
>
> I'll update the gist as soon as my DSL is fixed at home and blog on my
> learning curve as well, as soon as I can conjure up a nice example for
> rereduce, which I also only figured out through this excercise.
>
> Thanks again for helping the newbies, the willingness of everyone here to
> assist definitely helps drive couch adoption.
>
> Best
>
> --
> Kenneth Kalmer
> [email protected]
> http://opensourcery.co.za
>

HTH,
Paul Davis

Re: Some guidance with extremely slow indexing

Reply via email to