Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
On 14/11/14 14:54, Tom Lane wrote: > Jeremy Harris writes: >> On 14/11/14 00:46, Simon Riggs wrote: >>> Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) >>> -> Sort (cost= rows=568733 width=175) (actual time= >>> rows=20 loops=1) >>> Sort Method: top-N heapsort > >> Going off on a tangent, when I was playing with a merge-sort >> implementation I propagated limit information into the sort >> node, for a significant win. > > I'm not entirely following. The top-N heapsort approach already > makes use of the limit info. Having gone back to look, you're right. It was Uniq nodes I merged (the sort handles both bounded-output and dedup). -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
Jeremy Harris writes: > On 14/11/14 00:46, Simon Riggs wrote: >> Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) >> -> Sort (cost= rows=568733 width=175) (actual time= >> rows=20 loops=1) >> Sort Method: top-N heapsort > Going off on a tangent, when I was playing with a merge-sort > implementation I propagated limit information into the sort > node, for a significant win. I'm not entirely following. The top-N heapsort approach already makes use of the limit info. If the limit is so large that the sort spills to disk, then we stop thinking about the limit. But I'm finding it doubtful either that that's a case worthy of extra code or that you could get very much win if you did add code for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
On 14/11/14 00:46, Simon Riggs wrote: > Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) >-> Sort (cost= rows=568733 width=175) (actual time= > rows=20 loops=1) > Sort Method: top-N heapsort Going off on a tangent, when I was playing with a merge-sort implementation I propagated limit information into the sort node, for a significant win. Eliding the Limit node gave a further slight win. I wasn't convinced the use-case was common enough to justify the replacement of quicksort (despite having consistently fewer compares, the merge sort was slightly slower. I never understood why) - but I never asked. Is there any appetite for supporting alternate sort algorithms? -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
David G Johnston writes: > Tom Lane-2 wrote >> [ shrug... ] The estimated value is the planner's estimate of what would >> happen *if you ran the node to completion*, which in practice doesn't >> happen because of the LIMIT. > I don't see how a sort node cannot run to completion... The sort must have read all of its *input*, or it can't be sure it's giving the correct first result row. But "run to completion" means that it delivered all of its *output*, which obviously does not happen when under a LIMIT. It's entirely possible BTW that the sort's internal processing is not complete when it starts returning rows. For example, when we do a spill-to-disk merge sort, the final merge pass is typically done on-the-fly while returning rows, and so some fraction of that processing may never be completed if the query stops early. It's still seen all the input rows, but it hasn't completely determined their ordering. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
Tom Lane-2 wrote > [ shrug... ] The estimated value is the planner's estimate of what would > happen *if you ran the node to completion*, which in practice doesn't > happen because of the LIMIT. I don't see how a sort node cannot run to completion...raising the thought that the "actual" row count should not be 10 but should equal whatever the input row count size is. I guess there may be some efficient algorithms and/or inputs that make a sort go slower or faster but ultimately the node would have to ensure that every input row has been sorted before it can return control to its parent node. If the parent only cares about the first 10 rows of the now-sorted relation the sort node doesn't know or care. David J. -- View this message in context: http://postgresql.nabble.com/EXPLAIN-ANALYZE-output-weird-for-Top-N-Sort-tp5826922p5826938.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
David G Johnston writes: > Tom Lane-2 wrote >> [ shrug... ] The estimated value is the planner's estimate of what would >> happen *if you ran the node to completion*, which in practice doesn't >> happen because of the LIMIT. The actual value is, well, the actual value. >> We certainly should not munge around the actual value. >> >> We could imagine munging the reported estimates to account for the parent >> LIMIT, but that would make it a lot harder to understand the planner's >> "thought processes", because the reported estimates would have that much >> less to do with the numbers actually used in the internal calculations. > Is it even possible for a sort node directly under a limit to output (as > nebulous as that term is in this context) more rows that desired by the > limit? No. The "actual" rowcount is the number of rows returned by the node, and since the LIMIT node will stop calling its child for new rows once it's satisfied, there's no way for more rows to be returned. > The interesting thing about a sort node is not its output but its input - > i.e., the number of rows being fed to it via the node nested under it. Right, which you can read off directly from the EXPLAIN output as being the actual number of rows output by its child node. > Which prompts the question whether it would be good to show that value as an > attribute of the sort node during EXPLAIN ANALYZE instead of having to scan > down to the child node. It would not be useful IMO to duplicate the information; EXPLAIN output tends to be too voluminous already. I suppose somebody could do a thorough redesign of EXPLAIN's output layout to associate numbers with different plan levels than they are today ... but I'm less than convinced that it'd be an improvement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
Tom Lane-2 wrote > Simon Riggs < > simon@ > > writes: >> Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) >>-> Sort (cost= rows=568733 width=175) (actual time= >> rows=20 loops=1) >> Sort Method: top-N heapsort > >> The Sort estimate shows 568733 rows, whereas the actual rows are 20. > > [ shrug... ] The estimated value is the planner's estimate of what would > happen *if you ran the node to completion*, which in practice doesn't > happen because of the LIMIT. The actual value is, well, the actual value. > We certainly should not munge around the actual value. > > We could imagine munging the reported estimates to account for the parent > LIMIT, but that would make it a lot harder to understand the planner's > "thought processes", because the reported estimates would have that much > less to do with the numbers actually used in the internal calculations. Is it even possible for a sort node directly under a limit to output (as nebulous as that term is in this context) more rows that desired by the limit? The interesting thing about a sort node is not its output but its input - i.e., the number of rows being fed to it via the node nested under it. Which prompts the question whether it would be good to show that value as an attribute of the sort node during EXPLAIN ANALYZE instead of having to scan down to the child node. I guess you can argue that we are currently since that is the same value as the estimated rows returned. If you were to change that to reflect the impact of the parent limit node you'd probably want to add something else to reflect the child input size (in rows, not memory). >From a pure theory standpoint having the estimated rows reflect the input size instead of the output size seems wrong. In the presence of limit it won't output more than N rows whereas in all other cases the input and the output will be identical. That said I am only pondering this concept because of this thread - it would help to know what sparked all of this in the first place. From a practical perspective the current behavior captures the most important aspect of the sort - the size of the input - and the user knowing of the limit isn't likely to wonder whether we are somehow being wasteful by "returning" the extra rows; which are not returned so much as scanned over in place by the parent node. David J. -- View this message in context: http://postgresql.nabble.com/EXPLAIN-ANALYZE-output-weird-for-Top-N-Sort-tp5826922p5826935.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
Simon Riggs writes: > Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) >-> Sort (cost= rows=568733 width=175) (actual time= > rows=20 loops=1) > Sort Method: top-N heapsort > The Sort estimate shows 568733 rows, whereas the actual rows are 20. [ shrug... ] The estimated value is the planner's estimate of what would happen *if you ran the node to completion*, which in practice doesn't happen because of the LIMIT. The actual value is, well, the actual value. We certainly should not munge around the actual value. We could imagine munging the reported estimates to account for the parent LIMIT, but that would make it a lot harder to understand the planner's "thought processes", because the reported estimates would have that much less to do with the numbers actually used in the internal calculations. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort
Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) -> Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort The Sort estimate shows 568733 rows, whereas the actual rows are 20. Both are correct, in a way. The node feeding the Sort shows an actual of 379114 rows. If we are looking at rows returned, then the Sort node estimate should say 20 If we are looking at rows processed, then the Sort node should have actual rows=379114 I think we should say the latter; i,e, the Sort node should report 379114 rows, not 20 rows. Thoughts? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers