On Sat, May 19, 2018 at 02:39:26PM -0400, Tom Lane wrote:
> Justin Pryzby <[email protected]> writes:
> > [pryzbyj@database ~]$ while :; do for db in `psql postgres -Atc "SELECT
> > datname FROM pg_database WHERE datallowconn"`; do for t in pg_statistic
> > pg_attrdef pg_constraint; do echo "$db.$t...";
> > PGOPTIONS=-cstatement_timeout='9s' psql $db -qc "VACUUM FULL $t"; done;
> > done; done
>
> > ...
> > postgres.pg_statistic...
> > postgres.pg_attrdef...
> > postgres.pg_constraint...
> > template1.pg_statistic...
> > template1.pg_attrdef...
> > template1.pg_constraint...
> > ts.pg_statistic...
> > ERROR: canceling statement due to statement timeout
> > ts.pg_attrdef...
> > ts.pg_constraint...
> > postgres.pg_statistic...
> > ERROR: missing chunk number 0 for toast value 3372855171 in pg_toast_2619
>
> Hm, so was the timeout error happening every time through on that table,
> or just occasionally, or did you provoke it somehow? I'm wondering how
> your 9s timeout relates to the expected completion time.
Actually statement_timeout isn't essential for this (maybe it helps to triggers
it more often - not sure).
Could you try:
time sh -ec 'while :; do time psql postgres -c "VACUUM FULL VERBOSE
pg_toast.pg_toast_2619"; psql postgres -c "VACUUM FULL VERBOSE pg_statistic";
done'; date
Three servers experienced error within 30min, but one server didn't fail until
12h later, and a handful others still haven't failed..
Does this help at all ?
2018-05-24 21:57:49.98-03 | 5b075f8d.1ad1 | LOG | pryzbyj |
postgres | statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619
2018-05-24 21:57:50.067-03 | 5b075f8d.1ad1 | INFO | pryzbyj |
postgres | vacuuming "pg_toast.pg_toast_2619"
2018-05-24 21:57:50.09-03 | 5b075f8d.1ad1 | INFO | pryzbyj |
postgres | "pg_toast_2619": found 0 removable, 408 nonremovable row versions in
99 pages
2018-05-24 21:57:50.12-03 | 5b075f8e.1ada | LOG | pryzbyj |
postgres | statement: VACUUM FULL VERBOSE pg_statistic
2018-05-24 21:57:50.129-03 | 5b075f8e.1ada | INFO | pryzbyj |
postgres | vacuuming "pg_catalog.pg_statistic"
2018-05-24 21:57:50.185-03 | 5b075f8e.1ada | ERROR | pryzbyj |
postgres | missing chunk number 0 for toast value 3382957233 in pg_toast_2619
Some thing; this server has autovacuum logging, although it's not clear to me
if that's an essential component of the problem, either:
2018-05-24 21:16:39.856-06 | LOG | 5b078017.7b99 | pryzbyj | postgres |
statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619
2018-05-24 21:16:39.876-06 | LOG | 5b078010.7968 | | |
automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1
+
| | | | |
pages: 0 removed, 117 r
2018-05-24 21:16:39.909-06 | INFO | 5b078017.7b99 | pryzbyj | postgres |
vacuuming "pg_toast.pg_toast_2619"
2018-05-24 21:16:39.962-06 | INFO | 5b078017.7b99 | pryzbyj | postgres |
"pg_toast_2619": found 0 removable, 492 nonremovable row versions in 117 pages
2018-05-24 21:16:40.025-06 | LOG | 5b078018.7b9b | pryzbyj | postgres |
statement: VACUUM FULL VERBOSE pg_statistic
2018-05-24 21:16:40.064-06 | INFO | 5b078018.7b9b | pryzbyj | postgres |
vacuuming "pg_catalog.pg_statistic"
2018-05-24 21:16:40.145-06 | ERROR | 5b078018.7b9b | pryzbyj | postgres |
missing chunk number 0 for toast value 765874692 in pg_toast_2619
Or this one?
postgres=# SELECT log_time, database, user_name, session_id, left(message,999)
FROM postgres_log WHERE (log_time>='2018-05-24 19:56' AND log_time<'2018-05-24
19:58') AND (database='postgres' OR database IS NULL OR user_name IS NULL OR
user_name='pryzbyj') AND message NOT LIKE 'statement:%' ORDER BY 1;
log_time | 2018-05-24 19:56:35.396-04
database |
user_name |
session_id | 5b075131.3ec0
left | skipping vacuum of "pg_toast_2619" --- lock not available
...
log_time | 2019-05-24 19:57:35.78-04
database |
user_name |
session_id | 5b07516d.445e
left | automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index
scans: 1
: pages: 0 removed, 85 remain, 0 skipped due to pins, 0 skipped
frozen
: tuples: 1 removed, 348 remain, 0 are dead but not yet removable,
oldest xmin: 63803106
: buffer usage: 179 hits, 4 misses, 87 dirtied
: avg read rate: 1.450 MB/s, avg write rate: 31.531 MB/s
: system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.02 s
log_time | 2018-05-24 19:57:35.879-04
database | postgres
user_name | pryzbyj
session_id | 5b07516f.447f
left | missing chunk number 0 for toast value 624341680 in pg_toast_2619
log_time | 2018-05-24 19:57:44.332-04
database |
user_name |
session_id | 5af9fda3.70d5
left | checkpoint starting: time
Justin