Hi All,

we are running postgresql 9.1.15 on Debian.

we are, basically, running a postgresql cluster with two nodes. We are using 
synchronous streaming replication to make sure that the slave is always fully 
in sync (using a recovery.conf that points out the master). The slave is mainly 
used to maintain high availability. We also run backups on the slave to reduce 
load on the master.

On both nodes, postgresqls data is stored on a SAN.

At some point, we restarted the slave. The first thing it does is to do a 
pg_basebackup and then start streaming changes based on that. This seemed to 
work fine, until 16 hrs later when we made a backup on the slave, using 
pg_dump. pg_dump was unable to complete, due to a pg_toast error (ERROR: 
missing chunk number 0 for toast value 2753291 in pg_toast_22066).

The issue remained until we ran a full vacuum analyze on the cluster.

We have been running in this way on several clusters for some years now, 
basically since 9.1 was released, without seeing this issue. Since we upgraded 
to postgresql 9.1.15, we have seen it twice. This does not necessarily mean 
anything, but I thought I would mention it.

The issue is resolved now, but I would still like to understand what happened.
I have logfiles from the incident, but I cannot see anything out of the 
ordinary (despite having a fair amount of experience investigating postgresql 
logs).

I have read that this kind of issues are most frequently due to hardware issues 
or bugs in postgresql.

Using <https://bucardo.org/postgres_all_versions.html#version_9.1.15>,
I have reviewed the release notes from the subsequent 9.1 releases 
(9.1.15-9.1.19),
but I have been unable to identify anything that hints at the issue that we 
have experienced.

the closest one would be:
(9.1.16,9.4.2,9.3.7,9.2.11,9.0.20) Fix possible failure during hash index 
bucket split, if other processes are modifying the index concurrently (Tom Lane)

but that seems like a long shot.

My question is:
- does anyone know of bug reports (fixed or otherwise) that could shed light on 
our issue?
- does anyone have experience with this kind of behavior?

- our setup is fairly automated; would it be a good idea to always run vacuum 
analyze after a pg_restore (or before pg_dump)?


Reply via email to