Re: Bug or my lack of understanding? "Reduce output must shrink more rapidly"

Robert Newson Wed, 17 Aug 2011 10:55:57 -0700

The reduce_limit heuristic is there to save you from writing bad
reduce functions that are destined to fail in production as document
count increases. The result of a reduce call should be strictly
smaller than the input size (and preferably a lot smaller).


If the number of keys in the returned object is fixed, you'll probably
be fine, though testing with a sizeable number of documents (and
graphing the performance curve) will prove it.

B.

On 17 August 2011 18:24, Chris Stockton <[email protected]> wrote:
> Hello,
>
> On Tue, Aug 16, 2011 at 11:58 PM, Marcello Nuccio
> <[email protected]> wrote:
>> I don't think this is a bug. Citing from
>> http://guide.couchdb.org/draft/cookbook.html#aggregate
>>
>>    As a rule of thumb, the reduce function should reduce to a single
>> scalar value. That is, an integer; a string; or a small, fixed-size
>> list or object that includes an aggregated value (or values) from the
>> values argument.
>>
>> The object returned by your reduce function, is not fixed-size.
>> Actually it is bigger than the input document.
>>
>> Marcello
>>
>
> I actually think we fit the rule of thumb in the sense the object
> returned is fixed size object of scalars. It just so happens to have
> lots of keys. The reason this is necessary is because we do much more
> then just the total, we do the population variance, population
> standard deviation, square, sums, the average total including nulls,
> and the average total not including nulls.
>
> I wont go too deep into our applications architecture but this record
> represents a "row" in a list of user data with different kinds of data
> types, typically users don't want any kinds of stats etc, so we just
> serve out the documents. However some users want statistical
> information about their data, so in those cases we run the reduce, it
> is a win because we have 1 view for both the map and reduce. To
> redesign our map to emit columns means a separate view for statistics,
> which means a couple things that I am not okay with, like double the
> disk space, multiple view calls per statistics load.
>
> At the end of the day I think I disagree that this is not a bug, I
> think there is room for improvement in this area and should be
> discussed. If this is not the appropriate place to do so I can write a
> jira ticket.
>
> -Chris
>

Re: Bug or my lack of understanding? "Reduce output must shrink more rapidly"

Reply via email to