2010/9/3 Hans-Jürgen Schönig <h...@cybertec.at>: > On Sep 2, 2010, at 1:20 AM, Robert Haas wrote: >> I agree. Explicit partitioning may open up some additional optimization >> possibilities in certain cases, but Merge Append is more general and >> extremely valuable in its own right. > > we have revised greg's wonderful work and ported the entire thing to head. > it solves the problem of merge_append. i did some testing earlier on today > and it seems most important cases are working nicely.
First, thanks for merging this up to HEAD. I took a look through this patch tonight, and the previous reviews thereof that I was able to find, most notably Tom's detailed review on 2009-07-26. I'm not sure whether or not it's accidental that this didn't get added to the CF, but here's an attempt to enumerate the things that seem like they need to be fixed. The quotes labeled "TGL" are from the aforementioned review by Tom. 1. The code in set_append_rel_pathlist() that accumulates the pathkeys of all sub-paths is, as it says, and as previous discussed, O(n^2). In a previous email on this topic, Tom suggested on possible approach for this problem: choose the largest child relation and call it the leader, and consider only the pathkeys for that relation rather than all of them. I think it would be nice if we can find a way to be a bit smarter, though, because that won't necessarily always find the best path. One idea I had is to choose some arbitrary limit on how long the all_pathkeys list is allowed to become and iterate over the children from largest to smallest, stopping early if you hit that limit. But thinking about it a little more, can't we just adjust the way we do this so that it's not O(n^2)? It seems we're only concerned with equality here, so what about using a hash table? We could hash the PathKey pointers in each list, but not the lists or listcells obviously. 2. TGL: "you need an explicit flag to say 'we'll do a merge', not just rely on whether the path has pathkeys." This makes sense and doesn't seem difficult. 3. TGL: "Speaking of sorting, it's not entirely clear to me how the patch ensures that all the child plans produce the necessary sort keys as output columns, and especially not how it ensures that they all get produced in the *same* output columns. This might accidentally manage to work because of the "throwaway" call to make_sort_from_pathkeys(), but at the very least that's a misleading comment." I'm not sure what needs to be done about this; I'm going to look at this further. 4. TGL: "In any case, I'm amazed that it's not failing regression tests all over the place with those critical tests in make_sort_from_pathkeys lobotomized by random #ifdef FIXMEs. Perhaps we need some more regression tests...". Obviously, we need to remove that lobotomy and insert the correct fix for whatever problem it was trying to solve. Adding some regression tests seems wise, too. 5. TGL: "In the same vein, the hack to 'short circuit' the append stuff for a single child node is simply wrong, because it doesn't allow for column number variances. Please remove it." This seems like straightforward cleanup, and maybe another candidate for a regression test. (Actually, I notice that the patch has NO regression tests at all, which surely can't be right for something of this type, though admittedly since we didn't have EXPLAIN (COSTS OFF) when this was first written it might have been difficult to write anything meaningful at the time.) 6. The dummy call to cost_sort() seems like a crock; what that function does doesn't seem particularly relevant to the cost of the merge operation. Just off the top of my head, it looks like the cost of the merge step will be roughly O(lg n) * the cost of comparing two tuples * the total number of tuples from all child paths. In practice it might be less, because once some of the paths run out of tuples the number of comparisons will drop, I think. But the magnitude of that effect seems difficult to predict, and may be rather small, so perhaps we should just ignore it. 7. I think there's some basic code cleanup needed here, also: comment formatting, variable naming, etc. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers