Re: [HACKERS] Proposal: COUNT(*) (and related) speedup

Joshua Yanovski Fri, 04 Apr 2014 06:23:09 -0700

>From feedback on IRC, two immediately obvious technical problems:

* Heap pruning can happen at any time, not just during VACUUM and HOT
updates.  This is obviously a pretty significant issue and I doubt the
easy solution (don't do heap pruning for tables with an index like
this) would be acceptable.
* While relfrozenxid works for this in 9.4, it's not really related to
this purpose and its implementation could change at any time, so it is
not reasonable to rely on it.  A solution that would still work, but
not be as fast, would be to just not have a minimum for calculating
deltas.


On Fri, Apr 4, 2014 at 5:38 AM, Joshua Yanovski <[email protected]> wrote:
> Hey all,
>
> First off, please feel free to let me know if this idea is a waste of time :)
>
> I was thinking about COUNT(*) and its slow performance today, and I
> thought about whether we could get better performance by taking a page
> out of index-only-scans.
>
> Essentially, the idea is that you would store a counter (let's say, as
> a special index type) that would initially (on index creation) be set
> to the total count of
> all rows on fully visible pages (visibility map bit set to 1).
> Whenever a query was made on the table for COUNT(*), we would not even
> bother to visit any pages
> that were in the visibility map, but instead visit the heap for every
> page that was not marked visible and compute the delta between the
> count of tuples visible
> at relfrozenxid and tuples currently visible.  The deltas on all the
> pages would then be summed with the original counter and returned as
> the count of the table.
> The counter would be updated only by VACUUM, which would perform the
> same computation performed by the COUNT operation but add it
> permanently to counter just before
> it set the visible_map bit to 1 (I am not certain whether this would
> require unusual changes to WAL replay or not).  There may be some
> issues with this idea, since
> I am not too familiar with the Postgres codebase, but Andrew Gierth
> was kind enough to put up with my bugging him in the IRC channel all
> night and helped me work through
> some of this stuff so I think it should at least be coherent.
>
> The upshot of the above proposal is that in situations where an index
> only scan would have helped before, now you might be able to skip
> almost all the I/O for COUNT(*),
> while still getting a completely accurate result.
>
> The disadvantages are pretty clear, I think, but let me list them
> anyway (that I can think of):
> * Adding an extra index type that can do only one thing
> * Adding an index type that can only do that one thing efficiently if
> you happen to want to scan over the entire table
> * Adding an index type for a use case that isn't very compelling
> (SELECT COUNT(*) FROM table)
> * There may be additional concerns about this being costed correctly,
> although I think there is probably enough information to make an
> informed decision here.
> * It can't be added as a contrib module (probably) and also be crash-safe.
>
> However, if the patch for negative transition aggregate functions
> lands, I think this idea becomes a bit more compelling.  Then instead
> of just being a technique for
> COUNT(*), it effectively becomes a very low-memory way of speeding up
> a large number of aggregate queries (any that have an inverse) while
> maintaining accuracy, in
> situations where most of the pages never get modified.  The only
> change to the above algorithm is that delta is computed as application
> of the inverse function to "old"
> tuples and the direct function to "new" ones on the page (I think this
> can likely be done during vacuum index cleanup).
>
> Please critique this idea and let me know whether it is worth pursuing 
> further.
>
> Thanks!
>
> -- Joshua Yanovski



-- 
Josh


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: COUNT(*) (and related) speedup

Reply via email to