On Sun, Apr 12, 2009 at 12:51 PM, Kenneth Kalmer <[email protected]> wrote: > On Sun, Apr 12, 2009 at 1:11 AM, Chris Anderson <[email protected]> wrote: > >> On Sat, Apr 11, 2009 at 12:06 PM, Paul Davis >> <[email protected]> wrote: >> > On Sat, Apr 11, 2009 at 2:58 PM, Kenneth Kalmer >> > <[email protected]> wrote: >> >> On Thu, Apr 9, 2009 at 5:17 PM, Paul Davis <[email protected] >> >wrote: >> >> >> >>> Kenneth, >> >>> >> >>> I'm pretty sure you're issue is in the reduce steps for the daily and >> >>> montly views. The general rule of thumb is that you shouldn't be >> >>> returning data that grows faster than log(#keys processed) where as I >> >>> believe your data is growing linearly with input. >> >>> >> >>> This particular limitation is a result of the implementation of >> >>> incremental reductions. Basically, each key/pointer pair stores the >> >>> re-reduced value for all [re-]reduce values in its children nodes. So >> >>> as your reduction moves up the tree the data starts exploding which >> >>> kills btree performance not to mention the extra file I/O. >> >>> >> >>> The basic moral of the story is that if you want reduce views like >> >>> this per user you should emit a [user_id, date] pair as the key and >> >>> then call your reduce views with group=true. >> >>> >> >>> HTH, >> >>> Paul Davis >> >>> >> >> >> >> Hi Paul >> >> >> >> Thanks for taking the trouble of investigating for me, I'll dive into >> the >> >> views and clean them up a bit according to your advice as well as brush >> up >> >> on the caveat you explained. I saw other threads in the archives where >> you >> >> gave similar advice, sorry for not realizing I stepped into the same >> trap. >> >> When I've got the issue resolved I'll update the gist and we can leave >> it as >> >> a point of reference for others. >> >> >> >> Thanks again! >> >> >> > >> > Its kind of a hard one to notice right away as its not an error, it >> > just kills performance. Perhaps Damien was right in that we should >> > think about adding log vomiting when we detect that there's a crap >> > load of data accumulating in the reductions. >> > >> >> I agree -- maybe another config setting >> max_intermediate_reduction_size or something. So that you can raise it >> if you really know what you are doing. Unless there are hard-limits, >> in which case we should just error properly when we reach them. >> > > Hi Paul & Chris > > This would help, I'm sure a lot of people would be caught in this trap > initially. > > I've cleaned up my views a bit and the are much more performant now. On our > "production" couch where there is currently 6.6 million docs now the > indexing has been running now for close to 18 hours and is 80% done. I > killed the previous indexing task, since after 5 days it was only > 50-something percent done with 3.1 million docs at the time it started. >
Yeah, that sounds much closer to the expected performance. > After going through the docs carefully again and clearly thinking through my > problem, as well as taking the "emit([key, doc.user])" advice from Paul more > seriously I got it working. The docs gives the warning, without any real > references, making it sound like a "yeah whatever" kinda thing. I've updated the wiki with hopefully a more stern warning about the expected data characteristics of reduce functions. This is > dangerous. However the realm gem lies in a line I picked up somewhere in the > wiki, it stresses that the reduce views should build a summary, not > aggregate data, which was my mistake. I now aggregate the data in my own app > with two extra lines of code and the views now become very powerful using > group_level. So my old 'days' and 'daily' views are now combined in a > single, more useful, 'daily' view. > > I'll update the gist as soon as my DSL is fixed at home and blog on my > learning curve as well, as soon as I can conjure up a nice example for > rereduce, which I also only figured out through this excercise. > > Thanks again for helping the newbies, the willingness of everyone here to > assist definitely helps drive couch adoption. > > Best > > -- > Kenneth Kalmer > [email protected] > http://opensourcery.co.za > HTH, Paul Davis
