t> It seems to me this can't possibly work because of race conditions. > In particular, what happens when some query dirties a page and thereby > clears its fully-visible bit? Presumably, any such query would have > to (1) recompute the number of all-visible rows on that page (already > an expensive thing) and then (2) go and subtract that from the counter > (meaning the counter becomes a serialization bottleneck for all updates > on the table, which is exactly the reason we don't just have a globally > maintained row counter already).. Unless what you're saying is that the index creation itself invites race conditions, I don't think this is true. The general idea here is to not actually compute the number of rows on a page (or whatever other aggregate) unless (1) the index is being created, (2) it's requested by the user, (3) VACUUM is occurring, and not to update the counter except on (1) and (3). I am not sure off the top of my head about (1), but it seems from comments in vac_update_relstats that we are already making the implicit assumption that VACUUM is not running concurrently with another VACUUM on the same table. A query that dirties a page doesn't have to perform any additional work than it does already, under any circumstances, under this model.
> But worse, what happens if a count(*) > is in progress? It might or might not have scanned this page already, > and there's no way to get the right answer in both cases. Counter > updates done by VACUUM would have a similar race-condition problem. I don't think the first part really matters. If the page was visible when COUNT(*) started and then got dirtied by some other transaction, any changes by the second transaction shouldn't alter the actual count in our transaction anyway, so whether we scan the page needlessly or not seems beside the point. If that did matter, index only scans using COUNT(*) wouldn't work properly, since they make the same basic assumption. VACUUM counter updates, on the other hand, initially seem more problematic, since if we grab the value of the counter, then VACUUM updates the counter and the visbility bits, and then we check which visibility bits are set, we might skip pages we really need to check (since we're using an old value for the counter). But in practice I am pretty sure this is a nonissue, because as long as our COUNT(*) transaction is going, no pages can have their visibility bits marked unless they haven't been updated since the transaction started, which consequently means we won't see the counter updated (the counter is intended to be updated immediately before the visibility bit is set while VACUUMing). Took me a while to figure this out, for thanks for pointing it out :) > >> Please critique this idea and let me know whether it is worth pursuing >> further. > > I doubt it. So do I, but not for the reasons you listed, but because we can't rely on dead tuples sticking around for us to count them and generate deltas consistently. I think there is probably a way to get around that problem in a sane way with a more traditional index, though. > > regards, tom lane -- Josh -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers