On Sun, Feb 7, 2016 at 4:51 AM, Karsten Hilbert <karsten.hilb...@gmx.net> wrote:
> Just a shot in the dark for a possible lead to follow down (sorry for > top-posting): > > Is there index corruption on system tables ? > > (like, several index entries pointing to the one template0 row) > > Karsten > > > *Gesendet:* Sonntag, 07. Februar 2016 um 03:43 Uhr > *Von:* "Kazuaki Fujikura" <fu...@fujya.com> > *An:* pgsql-general@postgresql.org > *Betreff:* [GENERAL] four template0 databases after vacuum > Hi there, > > Version: 9.1.6 running since Dec, 2014 > We have 3 different databases. > > [problem history/background] > > Jan 10th, 2016: > The first problem was autovacuum issue. > - autovacuum could not finish successfully. > - I set autovacuum_freeze_max_age to 2 hundreds million. > - autovacuum immediately finished against the database which age was over > 2 hundreds million. > - so, autovacuum did not go next database > - I then run "vacuumdb -az" and run vacuum freeze analyze against > template0 after setting datallowconn to true > - I set datallowconn to false > > ============================================================ > $ vacuumdb -az > $ psql template1 > template1=# UPDATE pg_database SET datallowconn = TRUE where datname = > 'template0'; > UPDATE 1 > template1=# \c template0 > template0=# VACUUM FREEZE ANALYZE ; > VACUUM > template0=# \c template1 > template1=# UPDATE pg_database SET datallowconn = FALSE where datname = > 'template0'; > UPDATE 1 > ============================================================ > > - At this point, there is one template0 only > > > Jan 30th, 2016: > After three weeks, I again hit the same issue - autovacuum could not > finish. > This time, the age did not reduce with manual vacuum. > I then run vacuum full to pg_database. The age of pg_database becomes > minus value. > Then, autovacuum started again. > > ============================================================ > target_db=# SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind > = 'r' ; > relname | age > ------------------------------------------------+----------- > pg_database | 219383067 > target_db=# VACUUM FREEZE ; > VACUUM > target_db=# SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind > = 'r' ; > relname | age > ------------------------------------------------+----------- > pg_database | 219387307 > target_db=# VACUUM FULL ; > VACUUM > relname | age > ------------------------------------------------+----------- > pg_database | -1861408089 > ============================================================ > > > > Yesterday: > > I run the following command to run vacuum full to all pg_database. > Then, I run vacuum freeze analyze and vacuum full to template0 after > setting datallowconn. > > ============================================================ > $ psql -lt | awk '{print $1}' | grep -v ^$ | grep -v ^\| | while read > line; do psql ${line} -c "VACUUM FULL pg_database;"; done > VACUUM .... > > $ psql template1 > template1=# UPDATE pg_database SET datallowconn = TRUE where datname = > 'template0'; > UPDATE 1 > template1=# \c template0 > template0=# VACUUM FULL ; > VACUUM > template0=# \c template1 > template1=# UPDATE pg_database SET datallowconn = FALSE where datname = > 'template0'; > UPDATE 4 > ============================================================ > > ***Then I HAVE FOUR template0 DATABASES*** > > > > [Current problems] > > We now have three issues in our production. > > 1. It looks four template0 databases exist > 2. Xid of template0 keeps growing > 3. Can not freeze xid of template0 > > > > 1. It looks four template0 databases exist > > ============================================================ > $ psql -l | grep template0 > template0 | postgres | UTF8 > | C | C | =c/postgres + > template0 | postgres | UTF8 > | C | C | =c/postgres + > template0 | postgres | UTF8 > | C | C | =c/postgres + > template0 | postgres | UTF8 > | C | C | =c/postgres + > ============================================================ > > These have same dataid. > > ============================================================ > postgres=# SELECT datid, datname FROM pg_stat_database where datname = > 'template0'; > datid | datname > -------+----------- > 12772 | template0 > 12772 | template0 > 12772 | template0 > 12772 | template0 > (4 rows) > ============================================================ > > > > 2. Xid of template0 keeps growing > > ============================================================ > postgres=# SELECT datname, age(datfrozenxid) FROM pg_database order by age > desc; > datname | age > ------------------------------------------------+----------- > template0 | 198431852 > template0 | 198431852 > template0 | 198431852 > template0 | 50480024 > template1 | 45629585 > ============================================================ > > At this moment, the maximum age value of all databases is template0. > The age value keeps growing. > > One of 4 template0 is young (504080024). Other three template0s are still > old. > > > 3. Can not freeze xid of template0 > > To reset xid of template0, I did vacuum full/ vacuum freeze to template0. > But, > the age of three template0 did not change. Only of of 4 template0 had > successfully > changed the age young. > > ============================================================ > $ psql template1 > template1=# UPDATE pg_database SET datallowconn = TRUE where datname = > 'template0'; > UPDATE 4 > template1=# \c template0 > template0=# VACUUM FREEZE ANALYZE ; > VACUUM > template0=# VACUUM FULL ; > VACUUM > template0=# \c template1 > template1=# UPDATE pg_database SET datallowconn = FALSE where datname = > 'template0'; > UPDATE 4 > ============================================================ > > I run the commands above. But, I could not change the age of three > template0 databases. > > > > [My idea to fix this] > > If I don't do anything about this, I think our production service will be > down because it exceeds the limit of xid. > > I guess if I drop all template0 and create template0 again, then > everything gets back normal. > But I am not quite sure if my approach is right. > > I would appreciate any suggestion/comments. > > Best regards, > Kazuaki Fujikura > With regards to Karsten's thought, here is a query to find any pg_catalog indexes that are corrupt. SELECT n.nspname as schema, i.relname as table, i.indexrelname as index, i.idx_scan, i.idx_tup_read, i.idx_tup_fetch, CASE WHEN idx.indisprimary THEN 'pkey' WHEN idx.indisunique THEN 'uidx' ELSE 'idx' END AS type, pg_get_indexdef(idx.indexrelid), CASE WHEN idx.indisvalid THEN 'valid' ELSE 'INVALID' END as statusi, pg_relation_size(quote_ident(n.nspname)|| '.' || quote_ident(i.relname)) as size_in_bytes, pg_size_pretty(pg_relation_size(quote_ident(n.nspname)|| '.' || quote_ident(i.relname))) as size FROM pg_stat_all_indexes i JOIN pg_class c ON (c.oid = i.relid) JOIN pg_namespace n ON (n.oid = c.relnamespace) JOIN pg_index idx ON (idx.indexrelid = i.indexrelid ) WHERE n.nspname = 'pg_catalog' AND NOT idx.indisvalid ORDER BY 1, 2, 3; -- *Melvin Davidson* I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you.