Tom Lane-2 wrote
> Simon Riggs <

> simon@

> > writes:
>> Limit  (cost=.... rows=20 width=175) (actual time=.... rows=20 loops=1)
>>    ->  Sort  (cost=.... rows=568733 width=175) (actual time=....
>> rows=20 loops=1)
>>          Sort Method: top-N heapsort
> 
>> The Sort estimate shows 568733 rows, whereas the actual rows are 20.
> 
> [ shrug... ]  The estimated value is the planner's estimate of what would
> happen *if you ran the node to completion*, which in practice doesn't
> happen because of the LIMIT.  The actual value is, well, the actual value.
> We certainly should not munge around the actual value.
> 
> We could imagine munging the reported estimates to account for the parent
> LIMIT, but that would make it a lot harder to understand the planner's
> "thought processes", because the reported estimates would have that much
> less to do with the numbers actually used in the internal calculations.

Is it even possible for a sort node directly under a limit to output (as
nebulous as that term is in this context) more rows that desired by the
limit?

The interesting thing about a sort node is not its output but its input -
i.e., the number of rows being fed to it via the node nested under it. 
Which prompts the question whether it would be good to show that value as an
attribute of the sort node during EXPLAIN ANALYZE instead of having to scan
down to the child node.  I guess you can argue that we are currently since
that is the same value as the estimated rows returned.  If you were to
change that to reflect the impact of the parent limit node you'd probably
want to add something else to reflect the child input size (in rows, not
memory).

>From a pure theory standpoint having the estimated rows reflect the input
size instead of the output size seems wrong.  In the presence of limit it
won't output more than N rows whereas in all other cases the input and the
output will be identical.  That said I am only pondering this concept
because of this thread - it would help to know what sparked all of this in
the first place.  From a practical perspective the current behavior captures
the most important aspect of the sort - the size of the input - and the user
knowing of the limit isn't likely to wonder whether we are somehow being
wasteful by "returning" the extra rows; which are not returned so much as
scanned over in place by the parent node.

David J.






--
View this message in context: 
http://postgresql.nabble.com/EXPLAIN-ANALYZE-output-weird-for-Top-N-Sort-tp5826922p5826935.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to