I have a database that wouldn't start due to the disk filling up back on
1/10, unbeknownst to us until 2/27. This is jira, so it's critical data.
It appears jira was running in memory that entire time.
I needed to run pg_resetxlog -f in order to start the database. It
started, but upon logging in I found the system catalog and some data to be
I was able to run a pg_dumpall on the database and restore it to an
re-initialized cluster. However, there were 3 primary key errors during
the restore, because duplicate data got into the tables.
My hypothesis is that because of the system catalog corruption the primary
key uniqueness was not being enforced. Not sure when this occurred though
1) right after the disk filled up 2) when I ran pg_resetxlog -f or 3) after
I ran pg_resetxlog and before I did the backup. jira was still running
after I got it started and I waited a few hours to do the backup. My guess
is the duplicate data got in there right after the disk filled up on 1/10
We had a snapshot from 1/5 which is restored to production, such as it is.
But, they created another test vm for me to attempt to bring data back to
Is there anything I can do short of pg_resetxlog -f to bring this database
back up more safely, and possibly avoid the duplicate data/primary key
errors? It wouldn't start without the force option. Should I simply shut
down jira, try pg_restxlog -f again and do the pg_dumpall immediately?
These are the errors I am currently seeing while trying to start the
2018-03-02 11:01:06 CST LOG: database system was interrupted; last known
up at 2018-01-10 12:19:01 CST
2018-03-02 11:01:06 CST LOG: database system was not properly shut down;
automatic recovery in progress
2018-03-02 11:01:06 CST LOG: redo starts at 36/B8556D58
2018-03-02 11:01:06 CST LOG: incomplete startup packet
2018-03-02 11:01:07 CST FATAL: the database system is starting up
2018-03-02 11:01:12 CST LOG: incomplete startup packet
2018-03-02 11:01:29 CST FATAL: the database system is starting up
2018-03-02 11:01:30 CST LOG: record with zero length at 36/F754CBD8
2018-03-02 11:01:30 CST LOG: redo done at 36/F754CBA8
2018-03-02 11:01:30 CST LOG: last completed transaction was at log time
Any ideas or thoughts are appreciated.