I believe I have reproduced the behavior described by Andrew Hammond in

This is using the regression database:

1. In session 1, do
        set default_statistics_target TO 10000;
        analyze tenk1;
(We need the large stats target to ensure that tenk1's pg_statistic
entries require toasting.)

2. Attach to session 1 with a debugger and set a breakpoint at
CommitTransaction's call to CallXactCallbacks (or anyplace after
ProcArrayEndTransaction and before AtEOXact_Inval).

3. In session 2, do

        select count(*) from tenk1 where fivethous < 2500;

(This loads up session 2's syscaches with toasted pg_statistic entries.)

4. In session 1, again do

        analyze tenk1;

and wait for it to stop at the breakpoint.

5. In session 3 (or you can use session 2 for this), do
         vacuum verbose pg_statistic;
You should see it removing toast entries that were generated in step 1
and obsoleted in step 4.

6. In session 2, again do

        select count(*) from tenk1 where fivethous < 2500;

and voila:

        ERROR:  missing chunk number 0 for toast value 53668 in pg_toast_2619

What has happened here is that the second ANALYZE has marked itself
committed in pg_clog and no longer running in the ProcArray, so VACUUM
feels entitled to remove toast tuples that the ANALYZE deleted.  However,
the ANALYZE has not yet sent out the sinval messages that would inform
session 2 that its syscache entries are obsolete.  In Andrew's report,
presumably the machine was under enough load to slow down ANALYZE at
just this point, and there was a concurrent autovacuum that would have
done the rest of the deed.  The problem could only be seen for a short
interval, which squares with his report, and with a similar one from
Tim Uckun back in September.

Ordinarily, sending out sinval messages post-commit is okay because we
don't release locks until after that, and we suppose that our locks
prevent any other transactions from getting to the point of using
syscache entries that might have been invalidated by our transaction.
However, *we have carefully hacked on ANALYZE until it doesn't take any
locks that would block concurrent queries on the analyzed table.*  So
the normal protection against stale syscache entries simply doesn't
work for pg_statistic fetches.

I'm not sure about a good way to fix this.  When we last dealt with a
similar failure, Heikki suggested that we forcibly detoast all fields in
a tuple that we're putting into the syscaches:
I don't much like that, though, as it seems expensive, and I'm worried
about possible circularity of needing to know about all toastable fields
while making a syscache entry, and anyway it's papering over a symptom
rather than solving the actual problem that we're relying on a stale
syscache entry.

We could fix it by not using a syscache anymore for pg_statistic
entries, but that's probably not acceptable from a performance

A clean fix would be to add locking that blocks would-be users of
pg_statistic entries when an ANALYZE is about to commit.  This isn't
much fun from a performance standpoint either, but at least it should be
relatively cheap most of the time.


                        regards, tom lane

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to