2011/2/23 Łukasz Mielicki <[email protected]>: > On Wed, Feb 23, 2011 at 22:03, Paul Davis <[email protected]> wrote: > >>> For: reduce (null, [key_range_A1_A2, key_range_B1_B2, key_range_C1_C2], >>> true) >>> Does the relation A1 < A2 < B1 < B2 < C1 < C2 hold? Does it depend on >>> descending? >>> For: reduce ([k1,k2,k3], [v1,v2,v3], false) >>> Can I depend on k1 < k2 < k3 if descending=true? > >> These are specifically the assertions I'm saying you can't rely on. In >> your case you will absolutely need to sort the input data. > >> The only assumption I think its possible to make is that re-reductions >> will always process reductions from contiguous parts of the b+tree. >> I'm only at the *think* stage of certainty. I haven't gone through the >> implementation to check if that's an absolute truth, and I can't yet >> guarantee that it won't ever change if it is true. >> >> Just in case that doesn't make sense, imagine you have a set of >> reductions that are from ordered regions of the view b+tree labeled as >> such: A, B, C, D. That is to say, data in the reduction A is based >> entirely of keys <= reduction keys in B <= reduction keys in C <= >> reduction keys in D. Given that, the only assumption I think is >> possible is that you are guaranteed that if A and C are passed >> simultaneously to a re-reduce, then B is guaranteed to be passed as >> well. In other words, you wont get a reduction graph that looks like >> ((A | C) | (B | D)) which would break this sort of approach. > > Thanks for clarifying that. > The latter assumption was so inherent to me I didn't even consider it. > Don't think there would be much use of such reduce results at query > time. > > However form bird's eye view basing on that one I cannot see a reason > not to preserve the first two (in terms of computational complexity > even with reduce parallelized). It seems to be only a matter of > preserving original b-tree partitioning in parameter passing. > Providing such guarantees would be beneficial in two ways: > - building a view would not depend on query parameters (sane) > - left/right-reduce semantics available (without sorting) > > What you think?
I don't think any of those things should be guaranteed because that means we won't be able to change them at some point in the future if the need arises and I can't predict what changes we might need to make for various situations. I'm also suddenly wondering how this would behave when run on a cluster with the reductions.
