[HACKERS] strange behavior (corruption?) of large production database

2005-12-02 Thread Joe Conway
We have very strange behavior from an internal production database. There are multiple symptoms, all pointing to a problem with clusterwide tables. For example: [EMAIL PROTECTED]:~ psql -U postgres -p 5433 cyspec Welcome to psql 7.4.8, the PostgreSQL interactive terminal. Type: \copyright

Re: [HACKERS] strange behavior (corruption?) of large production database

2005-12-02 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes: We have very strange behavior from an internal production database. There are multiple symptoms, all pointing to a problem with clusterwide tables. For example: psql -l really should produce the same results as doing \l in the template1 database. Does it?

Re: [HACKERS] strange behavior (corruption?) of large production database

2005-12-02 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes: So they agree in template1 and cyspec databases. OK, in that case I'd wonder about whether you've suffered XID wraparound in pg_database and/or pg_shadow. The typical symptom of this is that entries are valid from the system's point of view but not visible

Re: [HACKERS] strange behavior (corruption?) of large production

2005-12-02 Thread Joe Conway
Tom Lane wrote: Joe Conway [EMAIL PROTECTED] writes: So they agree in template1 and cyspec databases. OK, in that case I'd wonder about whether you've suffered XID wraparound in pg_database and/or pg_shadow. The typical symptom of this is that entries are valid from the system's point of

Re: [HACKERS] strange behavior (corruption?) of large production database

2005-12-02 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes: Tom Lane wrote: You could try a VACUUM FREEZE on pg_database though. Since this is a production machine, putting pg_filedump on it may be problematic -- if I grovel through the bits by hand, can you give me a hint about what to look for? How about you

Re: [HACKERS] strange behavior (corruption?) of large production database

2005-12-02 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes: Since this database has many large, but static tables (i.e. new data is loaded each day, but the tables are partitioned into year-month tables), I'm thinking we can run VACUUM FREEZE on the whole database once, and then run VACUUM FREEZE periodically on

Re: [HACKERS] strange behavior (corruption?) of large production

2005-12-02 Thread Joe Conway
Tom Lane wrote: Joe Conway [EMAIL PROTECTED] writes: Since this database has many large, but static tables (i.e. new data is loaded each day, but the tables are partitioned into year-month tables), I'm thinking we can run VACUUM FREEZE on the whole database once, and then run VACUUM FREEZE

Re: [HACKERS] strange behavior (corruption?) of large production database

2005-12-02 Thread Tom Lane
Joe Conway [EMAIL PROTECTED] writes: cyspec=# vacuum freeze pg_catalog.pg_class; ERROR: failed to re-find parent key in pg_class_relname_nsp_index It seems that we cannot vacuum pg_class, because vacuum itself fails. Any suggestions on how to bootstrap the fixing of pg_class? REINDEX?

Re: [HACKERS] strange behavior (corruption?) of large production

2005-12-02 Thread Joe Conway
Tom Lane wrote: Joe Conway [EMAIL PROTECTED] writes: cyspec=# vacuum freeze pg_catalog.pg_class; ERROR: failed to re-find parent key in pg_class_relname_nsp_index It seems that we cannot vacuum pg_class, because vacuum itself fails. Any suggestions on how to bootstrap the fixing of