On Tue, Oct 20, 2015 at 4:17 AM, Alexander Korotkov <aekorot...@gmail.com> wrote: > Planner regression is fixed in the attached version of patch. It appears > that get_cheapest_fractional_path_for_pathkeys() behaved wrong when no > ordering is required.
I took a look at this. My remarks are not comprehensive, but are just what I noticed having looked at this for the first time in well over a year. Before anything else, I would like to emphasize that I think that this is pretty important work; it's not just a "nice to have". I'm very glad you picked it up, because we need it. In the real world, there will be *lots* of cases that this helps. Explain output ------------------- A query like your original test query looks like this for me: postgres=# explain analyze select * from test order by v1, v2 limit 100; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=429.54..434.53 rows=100 width=16) (actual time=15125.819..22414.038 rows=100 loops=1) -> Partial sort (cost=429.54..50295.52 rows=1000000 width=16) (actual time=15125.799..22413.297 rows=100 loops=1) Sort Key: v1, v2 Presorted Key: v1 Sort Method: top-N heapsort Memory: 27kB -> Index Scan using test_v1_idx on test (cost=0.42..47604.43 rows=1000000 width=16) (actual time=1.663..13.066 rows=151 loops=1) Planning time: 0.948 ms Execution time: 22414.895 ms (8 rows) I thought about it for a while, and decided that you have the basic shape of the explain output right here. I see where you are going by having the sort node drive things. I don't think the node should be called "Partial sort", though. I think that this is better presented as just a "Sort" node with a presorted key. I think it might be a good idea to also have a "Sort Groups: 2" field above. That illustrates that you are in fact performing 2 small sorts per group, which is the reality. As you said, it's good to have this be high, because then the sort operations don't need to do too many comparisons, which could be expensive. Sort Method ---------------- Even thought the explain analyze above shows "top-N heapsort" as its sort method, that isn't really true. I actually ran this through a debugger, which is why the above plan took so long to execute, in case you wondered. I saw that in practice the first sort executed for the first group uses a quicksort, while only the second sort (needed for the 2 and last group in this example) used a top-N heapsort. Is it really worth using a top-N heapsort to avoid sorting the last little bit of tuples in the last group? Maybe we should never push down to an individual sort operation (we have one tuplesort_performsort() per group) that it should be bounded. Our quicksort falls back on insertion sort in the event of only having 7 elements (at that level of recursion), so having this almost always use quicksort may be no bad thing. Even if you don't like that, the "Sort Method" shown above is just misleading. I wonder, also, if you need to be more careful about whether or not "Memory" is really the high watermark, as opposed to the memory used by the last sort operation of many. There could be many large tuples in one grouping, for example. Note that the current code will not show any "Memory" in explain analyze for cases that have memory freed before execution is done, which this is not consistent with. Maybe that's not so important. Unsure. trace_sort output shows that these queries often use a large number of tiny individual sorts. Maybe that's okay, or maybe we should make it clearer that the sorts are related. I now use trace_sort a lot. Abbreviated Keys ----------------------- It could be very bad for performance that the first non-presorted key uses abbreviated keys. There needs to be a way to tell tuplesort to not waste its time with them, just as there currently is for bounded (top-N heapsort) sorts. They're almost certainly the wrong way to go, unless you have huge groups (where partial sorting is unlikely to win in the first place). Other issues in executor -------------------------------- This is sort of an optimizer issue, but code lives in execAmi.c. Assert is redundant here: + case T_Sort: + /* We shouldn't reach here without having plan node */ + Assert(node); + /* With skipCols sort node holds only last bucket */ + if (node && ((Sort *)node)->skipCols == 0) + return true; + else + return false; I don't like that you've added a Plan node argument to ExecMaterializesOutput() in this function, too. There is similar assert/pointer test code within ExecSupportsBackwardScan() and ExecSupportsMarkRestore(). In general, I have concerns about the way the determination of a sort's ability to do stuff like be scanned backwards is now made dynamic, which this new code demonstrates: /* + * skipCols can't be used with either EXEC_FLAG_REWIND, EXEC_FLAG_BACKWARD + * or EXEC_FLAG_MARK, because we hold only current bucket in + * tuplesortstate. + */ + Assert(node->skipCols == 0 || (eflags & (EXEC_FLAG_REWIND | + EXEC_FLAG_BACKWARD | + EXEC_FLAG_MARK)) == 0); + I need to think some more about this general issue. Misc. issues ---------------- _readSort() needs READ_INT_FIELD(). _outSort() similarly needs WRITE_INT_FIELD(). You've mostly missed this stuff. Please be more careful about this. It's always a good idea to run the regression tests with "#define COPY_PARSE_PLAN_TREES" from time to time, which tends to highlight these problems. tuplesort.h should not include sortsupport.h. It's a modularity violation, and besides which is unnecessary. Similarly, pathkeys.c should not include optimizer/cost.h. What is this? + if (inner_cheapest_total && inner_cheapest_total->pathtype == T_Sort) + elog(ERROR, "Sort"); Optimizer ------------- I am not an expert on the optimizer, but I do have some feedback. * cost_sort() needs way way more comments. Doesn't even mention indexes. Not worth commenting further on until I know what it's *supposed* to do, though. * pathkeys_useful_for_ordering() now looks like a private convenience wrapper for the new public function pathkeys_common(). I think that comments should make this quite clear. * compare_bifractional_path_costs() should live beside compare_fractional_path_costs() and be public, I think. The existing compare_fractional_path_costs() also only has a small number of possible clients, and is still not static. * Think it's not okay that there are new arguments, such as the "tuples" argument for get_cheapest_fractional_path_for_pathkeys(). It seems a bad sign, design-wise, that a new argument of "PlannerInfo *root" was added at end, for the narrow purpose of passing stuff to estimate number of groups for the benefit of this patch. ISTM that grouping_planner() caller should do the work itself as and when it alone needs to. * New loop within get_cheapest_fractional_path_for_pathkeys() requires far more explanation. Explain theory behind derivation of compare_bifractional_path_costs() fraction arguments, please. I think there might be simple heuristics like this elsewhere in the optimizer or selfuncs.c, but you need to share why you did things that way in the code. * Within planner.c, "partial_sort_path" variable name in grouping_planner() might be called something else. Its purpose isn't clear. Also, the way that you mix path costs from the new get_cheapest_fractional_path_for_pathkeys() into the new cost_sort() needs to be explained in detail (as I already said, cost_sort() is currently very under-documented). Obviously the optimizer part of this needs the most work -- no surprises there. I wonder if we cover all useful cases? I haven't yet got around to using "#define OPTIMIZER_DEBUG" to see what's really going on, which seems essential to understanding what is really happening, but it looks like merge append paths can currently use the optimization, whereas unique paths cannot. Have you thought about that? That's all I have for now... -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers