t> It seems to me this can't possibly work because of race conditions.
> In particular, what happens when some query dirties a page and thereby
> clears its fully-visible bit?  Presumably, any such query would have
> to (1) recompute the number of all-visible rows on that page (already
> an expensive thing) and then (2) go and subtract that from the counter
> (meaning the counter becomes a serialization bottleneck for all updates
> on the table, which is exactly the reason we don't just have a globally
> maintained row counter already)..
Unless what you're saying is that the index creation itself invites
race conditions, I don't think this is true.  The general idea here is
to not actually compute the number of rows on a page (or whatever
other aggregate) unless (1) the index is being created, (2) it's
requested by the user, (3) VACUUM is occurring, and not to update the
counter except on (1) and (3).  I am not sure off the top of my head
about (1), but it seems from comments in vac_update_relstats that we
are already making the implicit assumption that VACUUM is not running
concurrently with another VACUUM on the same table.  A query that
dirties a page doesn't have to perform any additional work than it
does already, under any circumstances, under this model.

> But worse, what happens if a count(*)
> is in progress?  It might or might not have scanned this page already,
> and there's no way to get the right answer in both cases.  Counter
> updates done by VACUUM would have a similar race-condition problem.
I don't think the first part really matters.  If the page was visible
when COUNT(*) started and then got dirtied by some other transaction,
any changes by the second transaction shouldn't alter the actual count
in our transaction anyway, so whether we scan the page needlessly or
not seems beside the point.  If that did matter, index only scans
using COUNT(*) wouldn't work properly, since they make the same basic

VACUUM counter updates, on the other hand, initially seem more
problematic, since if we grab the value of the counter, then VACUUM
updates the counter and the visbility bits, and then we check which
visibility bits are set, we might skip pages we really need to check
(since we're using an old value for the counter).  But in practice I
am pretty sure this is a nonissue, because as long as our COUNT(*)
transaction is going, no pages can have their visibility bits marked
unless they haven't been updated since the transaction started, which
consequently means we won't see the counter updated (the counter is
intended to be updated immediately before the visibility bit is set
while VACUUMing).  Took me a while to figure this out, for thanks for
pointing it out :)

>> Please critique this idea and let me know whether it is worth pursuing 
>> further.
> I doubt it.
So do I, but not for the reasons you listed, but because we can't rely
on dead tuples sticking around for us to count them and generate
deltas consistently.  I think there is probably a way to get around
that problem in a sane way with a more traditional index, though.
>                         regards, tom lane


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to