On 21/12/2008, at 10:48 PM, Paul Davis wrote:

On Sun, Dec 21, 2008 at 1:20 AM, Antony Blakey <[email protected] > wrote:

On 21/12/2008, at 3:25 PM, Paul Davis wrote:

Hmm. I don't think so. Technically we could change the implementation
to feed reduce functions a single map-row at a time which would be
O(N). Might end up violating some (incorrect) expectations of reduce
input, but I'm pretty certain it shows that it's O(N).

But you have to combine it with previously reduced input, which means
fetching a number of previous results, with a cost that isn't clearly
related to N. Isn't the number of reductions needed to be combined is going
to be related to key distribution?


Good point. I just woke up so the brain is working kinda slow, but are
we describing best case and worst case scenarios too? If each key had
exactly one row and it was group=true, then it'd be O(N). If it was
group=false and every key was identical it'd be log(N). And then
normal use cases are somewhere in the middle?

Where the original N was new rows, versus this N being total rows?

e.g. average case is O(new rows) + O(log old rows)

My complexity maths is weak.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

In anything at all, perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away.
  -- Antoine de Saint-Exupery


Reply via email to