Hi Kevin, Thanks for the response.
Now we're getting somewhere. The disk drive "became corrupt" while PostgreSQL was running? Was the drive unmounted or remounted while PostgreSQL was running, or did you stop PostgreSQL first? Do you have any errors in the PostgreSQL log from the time this was all going on? The failure basically happened because the Django webapp we're running isn't effectively closing database connections. So, memory is completely filling up and causing the server to hang. Yesterday, when this happened it caused the entire network interface to become inoperable which meant that the iscsi connection to the shared drive stopped working and data became corrupt. I stopped the postgresql service before unmounting and remounting the target. My first concern is restoring the database. I'll fix the problems with django and apache later. I can deal with those problems. I'm also going to create a series of database backups that can be used to quickly restore data if this happens again. My concern is simply just getting this back to baseline. One more question occurs to me -- it seems unusual for someone to be running on a single disk with no RAID and no backup, but to be running with a version of PostgreSQL with is only about a month old. Was 8.1.21 the version you were running at the time of the failure, or have you upgraded during the recovery attempt? If you've upgraded, the version in use when the corruption occur This storage server has RAID and there are backups, it just so happens that the most recent usable backup is from June 20th. I completely forgot to configure the backups on this server. I normally wouldn't make this mistake, but I did this time. On the version, this is the version that comes standard with CentOS 5.5. This was a clean CentOS 5.5 install and it's been live for about a month. >> Also, it would help a lot to know what your postgresql.conf file >> contains (excluding all comments). The only uncommented lines are: max_connections = 500 shared_buffers = 4000 redirect_stderr = on log_directory = 'pg_log' log_filename = 'postgresql-%a.log' log_truncate_on_rotation = on log_rotation_age = 1440 log_rotation_size = 0 redirect_stderr = on lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' I can't, in good conscience, recommend any recovery attempts until you confirm that you have a copy to restore if the cleanup effort misfires. I have a full backup of the entire directory structure I took shortly after the database became unusable. On Wed, Jun 30, 2010 at 10:14 AM, Kevin Grittner < [email protected]> wrote: > Nathan Robertson <[email protected]> wrote: > > > There was a cascade effect. Apache failed which caused the server > > overall to fail. The data is stored on an iSCSI drive and the > > mount of the iSCSI drive became corrupt when everything failed. I > > was able to remount the drive and get access to data now I have > > this index error. > > Now we're getting somewhere. The disk drive "became corrupt" while > PostgreSQL was running? Was the drive unmounted or remounted while > PostgreSQL was running, or did you stop PostgreSQL first? Do you > have any errors in the PostgreSQL log from the time this was all > going on? > > Also, how confident are you that the Apache failure caused the drive > to be corrupted? That sounds *much* less likely than the other way > around. Without understanding that better, fixing one particular > problem in the database on this machine might be like rearranging > deck chairs on a sinking ship. > > > So, this is where I'm at. If anyone could help resolve the index > > cache error I would be eternally great full. > > We'd like to help, and perhaps someone else can suggest something on > the basis of information you've provided so far, but I'm not > comfortable suggesting something without a little more of a sense of > what happened and what your configuration is. > > >> Also, it would help a lot to know what your postgresql.conf file > >> contains (excluding all comments). > > This would still be useful. > > >> But first and foremost, you should make a file-copy backup of > >> your entire PostgreSQL data directory tree with the PostgreSQL > >> server stopped, if you haven't done that already. Any attempt at > >> recovery may misfire, and you might want to get back to what you > >> have now. > > I can't, in good conscience, recommend any recovery attempts until > you confirm that you have a copy to restore if the cleanup effort > misfires. > red could be > relevant. > One more question occurs to me -- it seems unusual for someone to be > running on a single disk with no RAID and no backup, but to be > running with a version of PostgreSQL with is only about a month old. > Was 8.1.21 the version you were running at the time of the failure, > or have you upgraded during the recovery attempt? If you've > upgraded, the version in use when the corruption occur > > -Kevin >
