Re: [HACKERS] Path question
I wrote: I rather wonder if we don't want two separate execution-time node types anyway, since what Append does seems significantly different from Merge (and MergeAppend would be just a misnomer). I've been working on this patch, and have gotten the executor side split out as a new node type. That adds something like 600 lines of boilerplate code in various places, but I think it's well worthwhile to get rid of the spaghetti-code effect of retail checks to see which kind of Append we're dealing with. (Greg didn't catch all the places that needed to distinguish, anyway.) I did run into a problem with my plan to call the new node type Merge: the planner is already using MergePath as the name for the Path struct that is precursor to a MergeJoin. For the moment I'm calling the new node type MergeAppend, but as mentioned I feel that that's a bit of a misnomer. The only good alternative that I can think of is to rename MergePath to MergeJoinPath (and then for consistency probably also HashPath to HashJoinPath and NestPath to NestLoopPath). While that wouldn't touch a huge number of files, it seems to carry some risk of breaking pending patches, and anyway those are names that go back to Berkeley days so people are used to 'em. Anybody have a strong feeling about what to call these things? At the moment I'm leaning to sticking with MergeAppend, but if we decide to rename it it'd probably be better to do so before committing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Thu, Oct 14, 2010 at 11:34 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: I rather wonder if we don't want two separate execution-time node types anyway, since what Append does seems significantly different from Merge (and MergeAppend would be just a misnomer). I've been working on this patch, and have gotten the executor side split out as a new node type. That adds something like 600 lines of boilerplate code in various places, but I think it's well worthwhile to get rid of the spaghetti-code effect of retail checks to see which kind of Append we're dealing with. (Greg didn't catch all the places that needed to distinguish, anyway.) I did run into a problem with my plan to call the new node type Merge: the planner is already using MergePath as the name for the Path struct that is precursor to a MergeJoin. For the moment I'm calling the new node type MergeAppend, but as mentioned I feel that that's a bit of a misnomer. The only good alternative that I can think of is to rename MergePath to MergeJoinPath (and then for consistency probably also HashPath to HashJoinPath and NestPath to NestLoopPath). While that wouldn't touch a huge number of files, it seems to carry some risk of breaking pending patches, and anyway those are names that go back to Berkeley days so people are used to 'em. Anybody have a strong feeling about what to call these things? At the moment I'm leaning to sticking with MergeAppend, but if we decide to rename it it'd probably be better to do so before committing. I don't like the idea of renaming the join nodes. Both the code churn and the possibility of confusing long-time users seem undesirable. Let's just stick with MergeAppend. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
Robert Haas robertmh...@gmail.com writes: On Thu, Oct 14, 2010 at 11:34 AM, Tom Lane t...@sss.pgh.pa.us wrote: Anybody have a strong feeling about what to call these things? At the moment I'm leaning to sticking with MergeAppend, but if we decide to rename it it'd probably be better to do so before committing. I don't like the idea of renaming the join nodes. Both the code churn and the possibility of confusing long-time users seem undesirable. Yeah, especially if MergePath would still be there but now meaning something different. The other possible line of attack is to call the new node type something else than either Merge or MergeAppend. Robert and I batted around a few thoughts off-list last night, but none of them seemed any better than MergeAppend. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
Robert Haas robertmh...@gmail.com writes: So I tried out the logic described in this email and, with a few modifications, it seemed to work. Updated patch attached, any review appreciated. Applied with revisions. 3. TGL: Speaking of sorting, it's not entirely clear to me how the patch ensures that all the child plans produce the necessary sort keys as output columns, and especially not how it ensures that they all get produced in the *same* output columns. This might accidentally manage to work because of the throwaway call to make_sort_from_pathkeys(), but at the very least that's a misleading comment. I'm not sure what needs to be done about this; I'm going to look at this further. I spent some time looking at this complaint, and I'm still not sure what needs to be done about it. Experimentation reveals that the neither the throwaway call to make_sort_from_pathkeys() nor any other call to that function is creating the relevant target list entries. Yeah, this was in fact pretty broken. make_sort_from_pathkeys *does* create the relevant tlist entries, but they were getting lost again because they were attached to a Result node that went into the bit bucket along with the throwaway Sort node. Much worse, it was hit or miss whether the tlist entries would be present in the child plan nodes --- as coded, this would be the case only for children that were explicitly sorted to meet the merge's pathkey requirements. If they weren't there, any sorting on resjunk values would misbehave. I was able to develop a test case that exposed this by using expression indexes, so that went into the regression tests. I fixed this by refactoring make_sort_from_pathkeys so that the tlist fixup logic was separately accessible. There were assorted other bugs too. I think it's all good now, but we'll find out when beta starts ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Thu, Oct 14, 2010 at 8:34 AM, Tom Lane t...@sss.pgh.pa.us wrote: I did run into a problem with my plan to call the new node type Merge: the planner is already using MergePath as the name for the Path struct that is precursor to a MergeJoin. For the moment I'm calling the new node type MergeAppend, but as mentioned I feel that that's a bit of a misnomer. On the plus side the resulting node does have a lot in common with Append nodes and a lot of places that do something special with Append nodes will have to do the same things with the new node, so having a similar name might help people remember that when they're adding their special code for Append. At the time I went back and forth on whether to have a separate node. I tried to do it and had the impression that there were a lot more places that would need to treat the two similarly than places that needed to treat the two differently. I'm curious to see how you do it cleanly. The only other name I batted around at the time was OrderedAppend which only alters the other half of the name, so no real help. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
Robert Haas robertmh...@gmail.com writes: Another awkwardness of this patch is that it makes create_append_path() and consequently set_dummy_rel_pathlist() take an additional root argument. While there's nothing terribly unreasonable about this on its face, it's only necessary so that create_append_path() can call cost_sort(), which takes root but doesn't actually use it. I'm not sure whether it's better to leave this as-is or to remove the root argument from cost_sort(). Right offhand the cleanest answer to that seems to be to leave create_append_path alone, and make a separate function named something like create_ordered_append_path that handles the case where cost_sort might be needed. I rather wonder if we don't want two separate execution-time node types anyway, since what Append does seems significantly different from Merge (and MergeAppend would be just a misnomer). I have to run off for a doctors appointment, will continue looking at this patch when I get back. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Tue, Sep 28, 2010 at 11:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ...what makes the pathkeys potentially useful is that they match the root pathkeys for the query. However, without Greg's hacks, the transposed child equivalence classes don't show up in the equivalence member lists, so get_eclass_for_sort_expr() generates new equivalence classes for them. Subsequently, when truncate_useless_pathkeys() is called, those new equivalence classes are found not to be equal to the overall ordering desired for the query, so it truncates them away. I'm not presently sure quite what the best way to fix this is... any ideas? Probably what's needed is to extend the child eclass member stuff to cover sort-key eclasses. Per relation.h: * em_is_child signifies that this element was built by transposing a member * for an inheritance parent relation to represent the corresponding expression * on an inheritance child. The element should be ignored for all purposes * except constructing inner-indexscan paths for the child relation. Likely these need to be ignored a bit less. A couple of Greg's undocumented hacks seem to be related to this point, but not all of them. In any case you'd need some careful thought about when to ignore child EMs and when not. Makes sense to me. It seems that there are actually two halves to this problem: getting the child EMs to be generated in the first place, and then getting them to be examined at the appropriate time. The FIXME in set_append_rel_pathlist() is there to ensure that add_child_rel_equivalences() always gets called, and the hack in add_child_rel_equivalences() is there to ensure that child EMs are generated unconditionally rather than only when they're useful for merging. That doesn't seem entirely off-base. In set_append_rel_pathlist(), I think that we shouldn't set has_eclass_joins on the child relation unless it's set on the parent, but calling add_child_rel_equivalences() unconditionally might be reasonable. Similarly, in add_child_rel_equivalences(), I think that the cur_ec-ec_has_const test should be retained, but the test on list_length(cur_ec-ec_members) needs to be relaxed or eliminated. As a matter of correctness, it seems like it should be OK to do this always - the worst thing that can happen is you end up with some extra EMs that don't (or shouldn't) do anything. However, it's possible that, for performance, we might want to avoid generating EMs that won't actually be needed. I'm not entirely clear on whether there is an inexpensive way for us to know that. Perhaps we could replace the list_length() test with a call to has_useful_pathkeys() and just accept that there will be some false positives. The two FIXMEs in make_sort_from_pathkeys() are there to ensure that we find the child EMs when we go to generate a sort. There's no reason that I can see to remove the em_has_const test, but it looks like we need to remove the em_is_child test. We need to match each pathkey to an element of the child's target list, which means we must consider an EM that appears in the child's target list, which is means a child EM or nothing. Testing reveals that if you keep Greg's hacks to add the child EMs but remove the hacks from make_sort_from_pathkeys(), it still works IF every child can use an index scan, but if you need to do an index scan on some children and sort others then you get: ERROR: could not find pathkey item to sort ...which seems consistent with the above analysis. If we accept that it's necessary, that still leaves the question of whether it's safe. I'm a little less certain on this point, but it seems like it ought to be OK. As far as I'm aware, we currently never sort append children. Indeed, unless equivalence classes for those append children were getting created by some mechanism other than add_child_rel_equivalences(), it seems like it ought to fail with the above error messages. And on the flip side it should be impossible for us to pick up a child EM when we meant to get some other one because they shouldn't be equal(). Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
2010/9/23 Robert Haas robertmh...@gmail.com: All of this leaves me wondering why Greg ended up ifdefing out this code in the first place. There's probably something I'm missing here... but for now I can't think of a better idea than just removing the #ifdefs and hoping that whatever problem they were causing was limited to an earlier version of the code that no longer exists. Sorry, I haven't gone completely AWOL, I just hadn't noticed this thread. pgsql-hackers turns out to be kind of a lot of traffic if you're not reading it continuously. As you subsequently discovered I added these FIXMEs because without them the paths just didn't compare equal when it was comparing the paths of the children with the paths of the parent. I think the reason you had difficulty demonstrating problems with at least some of the FIXMEs was that they really aren't functionally necessary. They're there because when Tom implemented the equivalence classes he had a clear idea of what they were supposed to represent and what variables they needed to include to represent that. And including variables of child relations or subqueries of a UNION in an equivalence class representing the parent relation just didn't fit within that. It doesn't necessarily cause problems but it changes the data structure representation invariant from what he had in mind. I don't have a good grasp of exactly what the implications of changing this data structure rep invariant are though. Is having hundreds or thousands of variables in a single equivalence class (actually for a whole bunch if the pathkey list is long) going to cause performance problems? Is including variables that are only present for one child of the relation going to limit the usefulness of the equivalence class data structure for solving other problems where those columns really aren't equivalent? Also, since I don't really understand what's going on I wondered if there wasn't an obvious way to accomplish the same thing perhaps more efficiently. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Wed, Sep 29, 2010 at 11:01 AM, Robert Haas robertmh...@gmail.com wrote: Makes sense to me. It seems that there are actually two halves to this problem: getting the child EMs to be generated in the first place, and then getting them to be examined at the appropriate time. So I tried out the logic described in this email and, with a few modifications, it seemed to work. Updated patch attached, any review appreciated. There are still a bunch of other things that need to be fixed here, but I think this is OK as far as this particular issue is concerned. I fixed a few other things: - create_append_plan must call create_plan_recurse rather than create_plan. This appears to be an error introduced by rebasing; the previous version of the patch seg faults when attempting to execute a plan wherein an inner index scan has been pushed down through an append node - remove the hack to short-circuit the append node altogether if there's only one child - some miscellaneous code cleanup (more is needed) 3. TGL: Speaking of sorting, it's not entirely clear to me how the patch ensures that all the child plans produce the necessary sort keys as output columns, and especially not how it ensures that they all get produced in the *same* output columns. This might accidentally manage to work because of the throwaway call to make_sort_from_pathkeys(), but at the very least that's a misleading comment. I'm not sure what needs to be done about this; I'm going to look at this further. I spent some time looking at this complaint, and I'm still not sure what needs to be done about it. Experimentation reveals that the neither the throwaway call to make_sort_from_pathkeys() nor any other call to that function is creating the relevant target list entries. It appears that they're getting created when set_append_rel_pathlist() calls adjust_appendrel_attrs(). I'm not sure whether that's sufficient or not. make_sort_from_pathkeys() could add any missing entries later, but I'm not too sure the order would match if that happened. Another awkwardness of this patch is that it makes create_append_path() and consequently set_dummy_rel_pathlist() take an additional root argument. While there's nothing terribly unreasonable about this on its face, it's only necessary so that create_append_path() can call cost_sort(), which takes root but doesn't actually use it. I'm not sure whether it's better to leave this as-is or to remove the root argument from cost_sort(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company merge_append_2010_09_29.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
2010/9/23 Robert Haas robertmh...@gmail.com: All of this leaves me wondering why Greg ended up ifdefing out this code in the first place. There's probably something I'm missing here... but for now I can't think of a better idea than just removing the #ifdefs and hoping that whatever problem they were causing was limited to an earlier version of the code that no longer exists. ...and FAIL. I missed the blindingly obvious here, which is that without these tests commented out, while it passes regression tests, the merge append stuff stops working. I think for some reason without this stuff in there the appropriate index paths fail to get generated for the child rels. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
2010/9/28 Robert Haas robertmh...@gmail.com: 2010/9/23 Robert Haas robertmh...@gmail.com: All of this leaves me wondering why Greg ended up ifdefing out this code in the first place. There's probably something I'm missing here... but for now I can't think of a better idea than just removing the #ifdefs and hoping that whatever problem they were causing was limited to an earlier version of the code that no longer exists. ...and FAIL. I missed the blindingly obvious here, which is that without these tests commented out, while it passes regression tests, the merge append stuff stops working. I think for some reason without this stuff in there the appropriate index paths fail to get generated for the child rels. Yep, that's the problem, all right. find_usable_indexes() calls build_index_pathkeys() to generate pathkeys for each available index on the child relation, and then calls truncate_useless_pathkeys() to get rid of any that aren't useful. In a query like this: explain select * from ma_parent order by name limit 10; ...what makes the pathkeys potentially useful is that they match the root pathkeys for the query. However, without Greg's hacks, the transposed child equivalence classes don't show up in the equivalence member lists, so get_eclass_for_sort_expr() generates new equivalence classes for them. Subsequently, when truncate_useless_pathkeys() is called, those new equivalence classes are found not to be equal to the overall ordering desired for the query, so it truncates them away. I'm not presently sure quite what the best way to fix this is... any ideas? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
Robert Haas robertmh...@gmail.com writes: ...what makes the pathkeys potentially useful is that they match the root pathkeys for the query. However, without Greg's hacks, the transposed child equivalence classes don't show up in the equivalence member lists, so get_eclass_for_sort_expr() generates new equivalence classes for them. Subsequently, when truncate_useless_pathkeys() is called, those new equivalence classes are found not to be equal to the overall ordering desired for the query, so it truncates them away. I'm not presently sure quite what the best way to fix this is... any ideas? Probably what's needed is to extend the child eclass member stuff to cover sort-key eclasses. Per relation.h: * em_is_child signifies that this element was built by transposing a member * for an inheritance parent relation to represent the corresponding expression * on an inheritance child. The element should be ignored for all purposes * except constructing inner-indexscan paths for the child relation. Likely these need to be ignored a bit less. A couple of Greg's undocumented hacks seem to be related to this point, but not all of them. In any case you'd need some careful thought about when to ignore child EMs and when not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
Robert Haas robertmh...@gmail.com writes: FIXME #1 and FIXME #2 were much harder to trigger. In fact, barring significant further lobotimization of the code, I couldn't. It's not that hard if you just tweak equivclass.c to make the order of equivalence-class lists different, viz diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index a20ed5f..9528d0b 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -353,7 +353,7 @@ add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids, { ec-ec_relids = bms_add_members(ec-ec_relids, relids); } - ec-ec_members = lappend(ec-ec_members, em); + ec-ec_members = lcons(em, ec-ec_members); return em; } Then for instance: regression=# create table t1 (f1 int); CREATE TABLE regression=# create table t2 () inherits (t1); CREATE TABLE regression=# explain select * from t1 a join t1 b using (f1); WARNING: FIXME #1 WARNING: FIXME #1 WARNING: FIXME #1 WARNING: FIXME #1 WARNING: FIXME #1 WARNING: FIXME #1 WARNING: FIXME #1 WARNING: FIXME #1 Since the order of equivalence-class member lists isn't supposed to be semantically significant, I claim that the code in createplan has to be able to deal with this. Note that what this is triggering is the em_is_child condition. I think it may indeed be impossible to get a hit on the em_is_const case as the system currently stands; the reason being that an EC containing a const won't normally show up as a pathkey. It can only do so if it's below_outer_join, as the comment notes. Now the calls to make_sort_from_pathkeys in createplan.c are only used for constructing subsidiary sorts for a mergejoin, and we won't consider building a mergejoin with an EC that contains a const (see eclass_useful_for_merging). There are some calls in planner.c that are associated with doing a final sort or distinct, but I suspect they'd never be applied with a below_outer_join EC. So given the current usage of make_sort_from_pathkeys it might be pretty hard to get it applied to an EC containing a constant. That's not a reason for it not to handle the case, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Fri, Sep 24, 2010 at 5:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: It's not that hard if you just tweak equivclass.c to make the order of equivalence-class lists different, viz [...] Since the order of equivalence-class member lists isn't supposed to be semantically significant, I claim that the code in createplan has to be able to deal with this. [...] That's not a reason for it not to handle the case, though. I'm not disputing that those tests are correct. All I'm saying is that I can't figure out a way to write regression tests that fail if they are removed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Tue, Sep 21, 2010 at 12:29 AM, David Fetter da...@fetter.org wrote: On Mon, Sep 20, 2010 at 10:57:00PM -0400, Robert Haas wrote: 2010/9/3 Hans-Jürgen Schönig h...@cybertec.at: On Sep 2, 2010, at 1:20 AM, Robert Haas wrote: I agree. Explicit partitioning may open up some additional optimization possibilities in certain cases, but Merge Append is more general and extremely valuable in its own right. we have revised greg's wonderful work and ported the entire thing to head. it solves the problem of merge_append. i did some testing earlier on today and it seems most important cases are working nicely. First, thanks for merging this up to HEAD. I took a look through this patch tonight, and the previous reviews thereof that I was able to find, most notably Tom's detailed review on 2009-07-26. I'm not sure whether or not it's accidental that this didn't get added to the CF, It's because I missed putting it in, and oversight I've corrected. If we need to bounce it on to the next one, them's the breaks. [points elided] 7. I think there's some basic code cleanup needed here, also: comment formatting, variable naming, etc. Hans-Jürgen, Will you be able to get to this in the next couple of days? I don't see a response to this which I assume means no - I'm going to take a crack at fixing some of these issues. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Sep 23, 2010, at 3:29 PM, Robert Haas wrote: On Tue, Sep 21, 2010 at 12:29 AM, David Fetter da...@fetter.org wrote: On Mon, Sep 20, 2010 at 10:57:00PM -0400, Robert Haas wrote: 2010/9/3 Hans-Jürgen Schönig h...@cybertec.at: On Sep 2, 2010, at 1:20 AM, Robert Haas wrote: I agree. Explicit partitioning may open up some additional optimization possibilities in certain cases, but Merge Append is more general and extremely valuable in its own right. we have revised greg's wonderful work and ported the entire thing to head. it solves the problem of merge_append. i did some testing earlier on today and it seems most important cases are working nicely. First, thanks for merging this up to HEAD. I took a look through this patch tonight, and the previous reviews thereof that I was able to find, most notably Tom's detailed review on 2009-07-26. I'm not sure whether or not it's accidental that this didn't get added to the CF, It's because I missed putting it in, and oversight I've corrected. If we need to bounce it on to the next one, them's the breaks. [points elided] 7. I think there's some basic code cleanup needed here, also: comment formatting, variable naming, etc. Hans-Jürgen, Will you be able to get to this in the next couple of days? I don't see a response to this which I assume means no - I'm going to take a crack at fixing some of these issues. hello ... sorry for not getting back to you sooner. i am currently on the road for some days. we got the top 3 things fixed already. however, some code seems to be relying on a sorted list somewhere(???). we are in the process of sorting out most of the stuff. i guess we will have something done next week. sorry for the delay. many thanks, hans -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
2010/9/23 Hans-Jürgen Schönig h...@cybertec.at: sorry for not getting back to you sooner. i am currently on the road for some days. we got the top 3 things fixed already. however, some code seems to be relying on a sorted list somewhere(???). we are in the process of sorting out most of the stuff. i guess we will have something done next week. Oh, cool. Is it possible for you to post your WIP any sooner? I've been looking at #4 today. Further details to follow after I finish beating my head against a wall. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
2010/9/20 Robert Haas robertmh...@gmail.com: 4. TGL: In any case, I'm amazed that it's not failing regression tests all over the place with those critical tests in make_sort_from_pathkeys lobotomized by random #ifdef FIXMEs. Perhaps we need some more regression tests Obviously, we need to remove that lobotomy and insert the correct fix for whatever problem it was trying to solve. Adding some regression tests seems wise, too. So, I spent a lot of time today trying to understand why these random FIXMEs were inserted and just exactly what they break or, uh, fix. I didn't get very far. The regression tests all pass with the latest version of Merge Append, with and without these FIXMEs, and in fact when I extracted just those changes and applied them to HEAD, the regression tests still pass. So I said to myself: Let's come up with a regression test that illustrates why these changes are Evil and Dangerous, and then we can add that to the regression test suite, per Tom's suggestion. This proved to be easier said than done: apparently our code is so good that it works even with random sections commented out. Go us. As a debugging aide, instead of actually diking these tests out, I made them elog(LOG). This has the same effect as removing them entirely but it makes it possible to see whether you've triggered the relevant code path. Then I tried to trigger those code paths, which I shall hereinafter refer to as FIXME #1, FIXME #2, FIXME #3, and FIXME #4, as per the attached patch. It proved to be pretty easy to trigger FIXME #3; indeed, just about every query I tried did the job. CREATE TABLE parent (a integer NOT NULL, b integer NOT NULL); CREATE TABLE child (b integer, a integer, c text NOT NULL) INHERITS (parent); CREATE TABLE joinme (j integer NOT NULL); I populated joinme with a single row with the value 1, and child with a million rows with a running from 1 to a million, b running from 10 million down to 9 million and 1, and c getting random()::text || random()::text || random()::text || random()::text. Then you can trigger FIXME #3 with either of these two (the second also triggers FIXME #4): select * from joinme, parent where a = j and j = 1; select * from parent order by a; I believe that the first case fires because of the ec_has_const condition and the second from the EC only has at most one member condition. However, it's difficult to see what problem this actually causes. From what I gather from reading the code, em_is_child EquivalenceMembers are only supposed to affect the construction of inner index-scan paths; and certainly if there's at most one EquivalenceMember it's unclear to me how you'd be forming a join. Maybe it's possible to break something in the ec_has_const case, but I haven't been to puzzle out what that thing is. FIXME #1 and FIXME #2 were much harder to trigger. In fact, barring significant further lobotimization of the code, I couldn't. For FIXME #1, the right sort of query seems to be something like this: select 1 from joinme left join parent on a = j where j = 1 order by a; ...but the problem is that in every example I could construct, the em_is_child EquivalenceMembers were *after* the members for the parent rel, so you never get far enough in the loop to see them. For related reasons, I couldn't get FIXME #2 to fire, either: in order to have a chance of firing FIXME #2, you have to get all the way through the loop where FIXME #1 is located without finding a match. I can't believe all of this code is here just for fun, but in every example I could come up with you quickly find a match in the first loop, and never even finish visiting all the members of that list, let alone reach the second loop. Somehow, you need to construct a case where the values to be sorted aren't directly emitted by the node immediately under the sort, but I couldn't figure out how to do that - no matter how I rearranged things, the planner just computed the necessary outputs one level down. Anyone have an idea? I did find one apparent weirdness in the way explain outputs sort keys, namely: rhaas=# explain (costs off) select 2 as x union all select 1 as x order by x; QUERY PLAN Sort Sort Key: (2) - Append - Result - Result (5 rows) This seems to have nothing to do with the problem at hand, but it looks pretty weird. The planner is in fact sorting on the column, not on the literal value 2, so this is (I think) just a display error. x would be a more reasonable representation of the sort key. All of this leaves me wondering why Greg ended up ifdefing out this code in the first place. There's probably something I'm missing here... but for now I can't think of a better idea than just removing the #ifdefs and hoping that whatever problem they were causing was limited to an earlier version of the code that no longer exists. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Re: [HACKERS] Path question
2010/9/3 Hans-Jürgen Schönig h...@cybertec.at: On Sep 2, 2010, at 1:20 AM, Robert Haas wrote: I agree. Explicit partitioning may open up some additional optimization possibilities in certain cases, but Merge Append is more general and extremely valuable in its own right. we have revised greg's wonderful work and ported the entire thing to head. it solves the problem of merge_append. i did some testing earlier on today and it seems most important cases are working nicely. First, thanks for merging this up to HEAD. I took a look through this patch tonight, and the previous reviews thereof that I was able to find, most notably Tom's detailed review on 2009-07-26. I'm not sure whether or not it's accidental that this didn't get added to the CF, but here's an attempt to enumerate the things that seem like they need to be fixed. The quotes labeled TGL are from the aforementioned review by Tom. 1. The code in set_append_rel_pathlist() that accumulates the pathkeys of all sub-paths is, as it says, and as previous discussed, O(n^2). In a previous email on this topic, Tom suggested on possible approach for this problem: choose the largest child relation and call it the leader, and consider only the pathkeys for that relation rather than all of them. I think it would be nice if we can find a way to be a bit smarter, though, because that won't necessarily always find the best path. One idea I had is to choose some arbitrary limit on how long the all_pathkeys list is allowed to become and iterate over the children from largest to smallest, stopping early if you hit that limit. But thinking about it a little more, can't we just adjust the way we do this so that it's not O(n^2)? It seems we're only concerned with equality here, so what about using a hash table? We could hash the PathKey pointers in each list, but not the lists or listcells obviously. 2. TGL: you need an explicit flag to say 'we'll do a merge', not just rely on whether the path has pathkeys. This makes sense and doesn't seem difficult. 3. TGL: Speaking of sorting, it's not entirely clear to me how the patch ensures that all the child plans produce the necessary sort keys as output columns, and especially not how it ensures that they all get produced in the *same* output columns. This might accidentally manage to work because of the throwaway call to make_sort_from_pathkeys(), but at the very least that's a misleading comment. I'm not sure what needs to be done about this; I'm going to look at this further. 4. TGL: In any case, I'm amazed that it's not failing regression tests all over the place with those critical tests in make_sort_from_pathkeys lobotomized by random #ifdef FIXMEs. Perhaps we need some more regression tests Obviously, we need to remove that lobotomy and insert the correct fix for whatever problem it was trying to solve. Adding some regression tests seems wise, too. 5. TGL: In the same vein, the hack to 'short circuit' the append stuff for a single child node is simply wrong, because it doesn't allow for column number variances. Please remove it. This seems like straightforward cleanup, and maybe another candidate for a regression test. (Actually, I notice that the patch has NO regression tests at all, which surely can't be right for something of this type, though admittedly since we didn't have EXPLAIN (COSTS OFF) when this was first written it might have been difficult to write anything meaningful at the time.) 6. The dummy call to cost_sort() seems like a crock; what that function does doesn't seem particularly relevant to the cost of the merge operation. Just off the top of my head, it looks like the cost of the merge step will be roughly O(lg n) * the cost of comparing two tuples * the total number of tuples from all child paths. In practice it might be less, because once some of the paths run out of tuples the number of comparisons will drop, I think. But the magnitude of that effect seems difficult to predict, and may be rather small, so perhaps we should just ignore it. 7. I think there's some basic code cleanup needed here, also: comment formatting, variable naming, etc. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Mon, Sep 20, 2010 at 10:57:00PM -0400, Robert Haas wrote: 2010/9/3 Hans-Jürgen Schönig h...@cybertec.at: On Sep 2, 2010, at 1:20 AM, Robert Haas wrote: I agree. Explicit partitioning may open up some additional optimization possibilities in certain cases, but Merge Append is more general and extremely valuable in its own right. we have revised greg's wonderful work and ported the entire thing to head. it solves the problem of merge_append. i did some testing earlier on today and it seems most important cases are working nicely. First, thanks for merging this up to HEAD. I took a look through this patch tonight, and the previous reviews thereof that I was able to find, most notably Tom's detailed review on 2009-07-26. I'm not sure whether or not it's accidental that this didn't get added to the CF, It's because I missed putting it in, and oversight I've corrected. If we need to bounce it on to the next one, them's the breaks. [points elided] 7. I think there's some basic code cleanup needed here, also: comment formatting, variable naming, etc. Hans-Jürgen, Will you be able to get to this in the next couple of days? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Sep 2, 2010, at 1:20 AM, Robert Haas wrote: On Sep 1, 2010, at 10:21 AM, Greg Stark gsst...@mit.edu wrote: For what it's worth I disagree with Tom. I think this is a situation where we need *both* types of solution. Ideally we will be able to use a plain Append node for cases where we know the relative ordering of the data in different partitions, but there will always be cases where the structured partition data doesn't actually match up with the ordering requested and we'll need to fall back to a merge-append node. I agree. Explicit partitioning may open up some additional optimization possibilities in certain cases, but Merge Append is more general and extremely valuable in its own right. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers we have revised greg's wonderful work and ported the entire thing to head. it solves the problem of merge_append. i did some testing earlier on today and it seems most important cases are working nicely. here are some test cases: test=# \d t_data Table public.t_data Column | Type | Modifiers +-+--- id | integer | tstamp | date| test=# \d t_data_1 Table public.t_data_1 Column | Type | Modifiers +-+--- id | integer | tstamp | date| Indexes: idx_1 btree (id) Check constraints: t_data_1_id_check CHECK (id = 1 AND id = 1) Inherits: t_data test=# \d t_data_2 Table public.t_data_2 Column | Type | Modifiers +-+--- id | integer | tstamp | date| Indexes: idx_2 btree (id) Check constraints: t_data_2_id_check CHECK (id = 10001 AND id = 2) Inherits: t_data test=# \d t_data_3 Table public.t_data_3 Column | Type | Modifiers +-+--- id | integer | tstamp | date| Indexes: idx_3 btree (id) Check constraints: t_data_3_id_check CHECK (id = 20001 AND id = 3) Inherits: t_data simple windowing ... test=# explain select *, max(id) OVER ( ORDER BY id) from t_data ; QUERY PLAN - WindowAgg (cost=149.99..2154.43 rows=32140 width=8) - Result (cost=149.99..1672.33 rows=32140 width=8) - Append (cost=149.99..1672.33 rows=32140 width=8) - Sort (cost=149.78..155.13 rows=2140 width=8) Sort Key: public.t_data.id - Seq Scan on t_data (cost=0.00..31.40 rows=2140 width=8) - Index Scan using idx_1 on t_data_1 t_data (cost=0.00..318.25 rows=1 width=8) - Index Scan using idx_2 on t_data_2 t_data (cost=0.00..318.25 rows=1 width=8) - Index Scan using idx_3 on t_data_3 t_data (cost=0.00..318.25 rows=1 width=8) (9 rows) it does a nice index scan; merges the stuff and puts it up into the high level doing the windowing. test=# select *, max(id) OVER ( ORDER BY id) from t_data LIMIT 10; id | tstamp | max ++- 1 | 2010-01-01 | 1 2 | 2010-01-01 | 2 3 | 2010-01-01 | 3 4 | 2010-01-01 | 4 5 | 2010-01-01 | 5 6 | 2010-01-01 | 6 7 | 2010-01-01 | 7 8 | 2010-01-01 | 8 9 | 2010-01-01 | 9 10 | 2010-01-01 | 10 (10 rows) the cost model does what it should as well: test=# explain select *, max(id) OVER ( ORDER BY id) from t_data ; QUERY PLAN - WindowAgg (cost=2872.41..3434.86 rows=32140 width=8) - Sort (cost=2872.41..2952.76 rows=32140 width=8) Sort Key: public.t_data.id - Result (cost=0.00..466.40 rows=32140 width=8) - Append (cost=0.00..466.40 rows=32140 width=8) - Seq Scan on t_data (cost=0.00..31.40 rows=2140 width=8) - Seq Scan on t_data_1 t_data (cost=0.00..145.00 rows=1 width=8) - Seq Scan on t_data_2 t_data (cost=0.00..145.00 rows=1 width=8) - Seq Scan on t_data_3 t_data (cost=0.00..145.00 rows=1 width=8) (9 rows) it has proven to be really valuable in my first tests. maybe this is helpful for some people out there. many thanks, hans merge-append-91-v1.diff Description: Binary data -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
Boszormenyi Zoltan z...@cybertec.at writes: we are experimenting with modifying table partitioning so the ORDER BY clause can be pushed down to child nodes on the grounds that: This is really premature, and anything you do along those lines now will probably never get committed. The problem is that the transformation you propose is wrong unless the planner can prove that the different child tables contain nonoverlapping ranges of the sort key. Now you might be intending to add logic to try to prove that from inspection of constraints, but I don't believe that reverse-engineering such knowledge on the fly is a sane approach: it will be hugely expensive and will add that cost even in many situations where the optimization fails to apply. The project direction is that we are going to add some explicit representation of partitioned tables. After that, the planner can just know immediately that a range-partitioned sort key is amenable to this treatment, and at that point it'll make sense to work on it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
2010/9/1 Boszormenyi Zoltan z...@cybertec.at: we are experimenting with modifying table partitioning so the ORDER BY clause can be pushed down to child nodes on the grounds that: 1. smaller datasets are faster to sort, e.g. two datasets that almost spill out to disk are faster to sort in memory and later merge them than the union set that spills out to disk, or two larger sets that spill out to disk are faster to sort individually than the union dataset (because of the longer seeks, etc) For what it's worth this logic doesn't necessarily hold. Sorting two data sets in memory is only faster if you only need one result set at a time. If you know the first set contains only records which come before the second then you can do this but if not then you'll need all the result sets to be available at the same time which will require spilling them to disk. If you have to do that then you might as well do a tapesort anyways. 2. individual child nodes can have indexes that produce the sorted output already This is the big win. Currently Append nodes mean throwing out any ordering from the child nodes and that's important information to be losing. Currently I am abusing the AppendPath node but later I will add a new executor node that will merge the sorted input coming from the children. You realize I already did this a few years ago. Search for merge append on the lists. The hard part -- as usual -- ends up being figuring out in the planner when to do this optimization and how to avoid exponential growth in the number of plans considered and the amount of data retained about each plan. For what it's worth I disagree with Tom. I think this is a situation where we need *both* types of solution. Ideally we will be able to use a plain Append node for cases where we know the relative ordering of the data in different partitions, but there will always be cases where the structured partition data doesn't actually match up with the ordering requested and we'll need to fall back to a merge-append node. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Sep 1, 2010, at 4:10 PM, Tom Lane wrote: Boszormenyi Zoltan z...@cybertec.at writes: we are experimenting with modifying table partitioning so the ORDER BY clause can be pushed down to child nodes on the grounds that: This is really premature, and anything you do along those lines now will probably never get committed. The problem is that the transformation you propose is wrong unless the planner can prove that the different child tables contain nonoverlapping ranges of the sort key. Now you might be intending to add logic to try to prove that from inspection of constraints, but I don't believe that reverse-engineering such knowledge on the fly is a sane approach: it will be hugely expensive and will add that cost even in many situations where the optimization fails to apply. well, why non-overlapping? the idea is to make append smart enough to take the sorted lists from below and merge them which will give sorted output as well. my original idea was what you described but given Martijn van Oosterhout's posting we were pretty confident that we can get along without non-overlapping partitions. The project direction is that we are going to add some explicit representation of partitioned tables. After that, the planner can just know immediately that a range-partitioned sort key is amenable to this treatment, and at that point it'll make sense to work on it. can you outline some ideas here and maybe point to some useful discussion here? many thanks, hans -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
=?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= postg...@cybertec.at writes: On Sep 1, 2010, at 4:10 PM, Tom Lane wrote: This is really premature, and anything you do along those lines now will probably never get committed. well, why non-overlapping? the idea is to make append smart enough to take the sorted lists from below and merge them which will give sorted output as well. Well, an extra merge step is going to change the cost comparisons quite a bit; see Greg Starks' comments. But in any case, my point wasn't that this is something we should never do; it was that it makes more sense to wait till something has happened with explicit partitioning. The project direction is that we are going to add some explicit representation of partitioned tables. can you outline some ideas here and maybe point to some useful discussion here? There's been boatloads of discussion of partitioning, and at least one submitted patch, over the past year or so ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
hello tom, yeah, we have followed quite a lot of discussion as well ... and yes, no patches. as far as this problem is concerned: we are working on a patch and did some prototyping inside the planner already (attached). the code we have is pretty limited atm (such as checking for a sort clause explicitly and so on - it has no support for windowing related optimizations and so on so far). the cost model is not our problem - it is a lot easier to read than the code we are fighting with (it is a different level of complexity). i think costs can be handled. yes, this merging adds some costs for sure. we might see a hell amount of operators being called - but i think given a reasonable number of partitions it is still a lot cheaper than actually resorting ... and, it is a lot more space efficient. in my practical case i cannot resort because we would simply run out of space. an index scan is expensive but needs no additional sort space ... and, merge is O(n) which sort is clearly not. advise is highly appreciated. many thanks, hans push-down-sort-into-inh-2.patch Description: Binary data On Sep 1, 2010, at 5:00 PM, Tom Lane wrote: =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= postg...@cybertec.at writes: On Sep 1, 2010, at 4:10 PM, Tom Lane wrote: This is really premature, and anything you do along those lines now will probably never get committed. well, why non-overlapping? the idea is to make append smart enough to take the sorted lists from below and merge them which will give sorted output as well. Well, an extra merge step is going to change the cost comparisons quite a bit; see Greg Starks' comments. But in any case, my point wasn't that this is something we should never do; it was that it makes more sense to wait till something has happened with explicit partitioning. The project direction is that we are going to add some explicit representation of partitioned tables. can you outline some ideas here and maybe point to some useful discussion here? There's been boatloads of discussion of partitioning, and at least one submitted patch, over the past year or so ... regards, tom lane -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Path question
On Sep 1, 2010, at 10:21 AM, Greg Stark gsst...@mit.edu wrote: For what it's worth I disagree with Tom. I think this is a situation where we need *both* types of solution. Ideally we will be able to use a plain Append node for cases where we know the relative ordering of the data in different partitions, but there will always be cases where the structured partition data doesn't actually match up with the ordering requested and we'll need to fall back to a merge-append node. I agree. Explicit partitioning may open up some additional optimization possibilities in certain cases, but Merge Append is more general and extremely valuable in its own right. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers