The reduce_limit heuristic is there to save you from writing bad reduce functions that are destined to fail in production as document count increases. The result of a reduce call should be strictly smaller than the input size (and preferably a lot smaller).
If the number of keys in the returned object is fixed, you'll probably be fine, though testing with a sizeable number of documents (and graphing the performance curve) will prove it. B. On 17 August 2011 18:24, Chris Stockton <[email protected]> wrote: > Hello, > > On Tue, Aug 16, 2011 at 11:58 PM, Marcello Nuccio > <[email protected]> wrote: >> I don't think this is a bug. Citing from >> http://guide.couchdb.org/draft/cookbook.html#aggregate >> >> As a rule of thumb, the reduce function should reduce to a single >> scalar value. That is, an integer; a string; or a small, fixed-size >> list or object that includes an aggregated value (or values) from the >> values argument. >> >> The object returned by your reduce function, is not fixed-size. >> Actually it is bigger than the input document. >> >> Marcello >> > > I actually think we fit the rule of thumb in the sense the object > returned is fixed size object of scalars. It just so happens to have > lots of keys. The reason this is necessary is because we do much more > then just the total, we do the population variance, population > standard deviation, square, sums, the average total including nulls, > and the average total not including nulls. > > I wont go too deep into our applications architecture but this record > represents a "row" in a list of user data with different kinds of data > types, typically users don't want any kinds of stats etc, so we just > serve out the documents. However some users want statistical > information about their data, so in those cases we run the reduce, it > is a win because we have 1 view for both the map and reduce. To > redesign our map to emit columns means a separate view for statistics, > which means a couple things that I am not okay with, like double the > disk space, multiple view calls per statistics load. > > At the end of the day I think I disagree that this is not a bug, I > think there is room for improvement in this area and should be > discussed. If this is not the appropriate place to do so I can write a > jira ticket. > > -Chris >
