First off, please feel free to let me know if this idea is a waste of time :)
I was thinking about COUNT(*) and its slow performance today, and I
thought about whether we could get better performance by taking a page
out of index-only-scans.
Essentially, the idea is that you would store a counter (let's say, as
a special index type) that would initially (on index creation) be set
to the total count of
all rows on fully visible pages (visibility map bit set to 1).
Whenever a query was made on the table for COUNT(*), we would not even
bother to visit any pages
that were in the visibility map, but instead visit the heap for every
page that was not marked visible and compute the delta between the
count of tuples visible
at relfrozenxid and tuples currently visible. The deltas on all the
pages would then be summed with the original counter and returned as
the count of the table.
The counter would be updated only by VACUUM, which would perform the
same computation performed by the COUNT operation but add it
permanently to counter just before
it set the visible_map bit to 1 (I am not certain whether this would
require unusual changes to WAL replay or not). There may be some
issues with this idea, since
I am not too familiar with the Postgres codebase, but Andrew Gierth
was kind enough to put up with my bugging him in the IRC channel all
night and helped me work through
some of this stuff so I think it should at least be coherent.
The upshot of the above proposal is that in situations where an index
only scan would have helped before, now you might be able to skip
almost all the I/O for COUNT(*),
while still getting a completely accurate result.
The disadvantages are pretty clear, I think, but let me list them
anyway (that I can think of):
* Adding an extra index type that can do only one thing
* Adding an index type that can only do that one thing efficiently if
you happen to want to scan over the entire table
* Adding an index type for a use case that isn't very compelling
(SELECT COUNT(*) FROM table)
* There may be additional concerns about this being costed correctly,
although I think there is probably enough information to make an
informed decision here.
* It can't be added as a contrib module (probably) and also be crash-safe.
However, if the patch for negative transition aggregate functions
lands, I think this idea becomes a bit more compelling. Then instead
of just being a technique for
COUNT(*), it effectively becomes a very low-memory way of speeding up
a large number of aggregate queries (any that have an inverse) while
maintaining accuracy, in
situations where most of the pages never get modified. The only
change to the above algorithm is that delta is computed as application
of the inverse function to "old"
tuples and the direct function to "new" ones on the page (I think this
can likely be done during vacuum index cleanup).
Please critique this idea and let me know whether it is worth pursuing further.
-- Joshua Yanovski
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: