On 17 June 2010 20:06, J Chris Anderson <[email protected]> wrote: > > On Jun 17, 2010, at 9:29 AM, afters wrote: > > > On 17 June 2010 18:10, J Chris Anderson <[email protected]> wrote: > > > >> > >> The reduce-limit is a general heuristic, because some very bad reduces > will > >> actually grow asymptotically so that the full reduce contains as much > data > >> as the entire group=true reduce. It sounds like yours is OK (large but > not > >> growing) so you are probably fine (although keeping 4kb of stuff in the > >> intermediate reduction value storage is going to kill performance. > > > > > > I could limit it to 1kb perhaps - at this point it doesn't matter too > much. > > I imagine it would still maim, if not kill, performance. Correct? > > I bet 1kb will be more than 4 times faster than 4kb, so it's worth a shot. > But I'm guess you are probably better off in terms of scalability to have a > lean reduce index, and use the results from that to know which document to > fetch. > > OTOH if you are gonna be working only with smaller data sets, then you may > even be fine with what you've got. Just be aware that with large reductions > (especially reductions that are giant when called without group=true) you > are introducing a bunch of overhead, and things will slow down as your > database grows.
> Is it correct that reductions spread up the b-tree only as high as needed to satisfy the group-level demands? > If you keep your reduces simple, like _sum and _count, or similar data > structures, you should be fine. > > Read this for a survey of reduction techniques that can scale > http://labs.google.com/papers/sawzall.html > > I will look into that. Thanks. > > > > Any way to break it up and maybe use the reduce to know which document to > >> query to get the big blob of text? > >> > >> > > I could certainly do that. Indeed my original plan, before discovering > the > > magic of 'group=true', was to fetch each piece of entity-data separately. > > > > a. > > > > > >> Chris > >> > >> > >
