Re: [ADMIN] cache lookup failed for index

Nathan Robertson Wed, 30 Jun 2010 07:36:30 -0700

Hi Kevin,

Thanks for the response.

Now we're getting somewhere.  The disk drive "became corrupt" while
PostgreSQL was running?  Was the drive unmounted or remounted while
PostgreSQL was running, or did you stop PostgreSQL first?  Do you
have any errors in the PostgreSQL log from the time this was all
going on?

The failure basically happened because the Django webapp we're running isn't
effectively closing database connections. So, memory is completely filling
up and causing the server to hang. Yesterday, when this happened it caused
the entire network interface to become inoperable which meant that the iscsi
connection to the shared drive stopped working and data became corrupt.

I stopped the postgresql service before unmounting and remounting the
target.

My first concern is restoring the database. I'll fix the problems with
django and apache later. I can deal with those problems. I'm also going to
create a series of database backups that can be used to quickly restore data
if this happens again. My concern is simply just getting this back to
baseline.

One more question occurs to me -- it seems unusual for someone to be
running on a single disk with no RAID and no backup, but to be
running with a version of PostgreSQL with is only about a month old.
Was 8.1.21 the version you were running at the time of the failure,
or have you upgraded during the recovery attempt?  If you've
upgraded, the version in use when the corruption occur

This storage server has RAID and there are backups, it just so happens that
the most recent usable backup is from June 20th. I completely forgot to
configure the backups on this server. I normally wouldn't make this mistake,
but I did this time.

On the version, this is the version that comes standard with CentOS 5.5.
This was a clean CentOS 5.5 install and it's been live for about a month.

>> Also, it would help a lot to know what your postgresql.conf file
>> contains (excluding all comments).

The only uncommented lines are:
max_connections = 500
shared_buffers = 4000
redirect_stderr = on
log_directory = 'pg_log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1440
log_rotation_size = 0
redirect_stderr = on
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'

I can't, in good conscience, recommend any recovery attempts until
you confirm that you have a copy to restore if the cleanup effort
misfires.

I have a full backup of the entire directory structure I took shortly after
the database became unusable.

On Wed, Jun 30, 2010 at 10:14 AM, Kevin Grittner <
[email protected]> wrote:

> Nathan Robertson <[email protected]> wrote:
>
> > There was a cascade effect. Apache failed which caused the server
> > overall to fail. The data is stored on an iSCSI drive and the
> > mount of the iSCSI drive became corrupt when everything failed. I
> > was able to remount the drive and get access to data now I have
> > this index error.
>
> Now we're getting somewhere.  The disk drive "became corrupt" while
> PostgreSQL was running?  Was the drive unmounted or remounted while
> PostgreSQL was running, or did you stop PostgreSQL first?  Do you
> have any errors in the PostgreSQL log from the time this was all
> going on?
>
> Also, how confident are you that the Apache failure caused the drive
> to be corrupted?  That sounds *much* less likely than the other way
> around.  Without understanding that better, fixing one particular
> problem in the database on this machine might be like rearranging
> deck chairs on a sinking ship.
>
> > So, this is where I'm at. If anyone could help resolve the index
> > cache error I would be eternally great full.
>
> We'd like to help, and perhaps someone else can suggest something on
> the basis of information you've provided so far, but I'm not
> comfortable suggesting something without a little more of a sense of
> what happened and what your configuration is.
>
> >> Also, it would help a lot to know what your postgresql.conf file
> >> contains (excluding all comments).
>
> This would still be useful.
>
> >> But first and foremost, you should make a file-copy backup of
> >> your entire PostgreSQL data directory tree with the PostgreSQL
> >> server stopped, if you haven't done that already.  Any attempt at
> >> recovery may misfire, and you might want to get back to what you
> >> have now.
>
> I can't, in good conscience, recommend any recovery attempts until
> you confirm that you have a copy to restore if the cleanup effort
> misfires.
> red could be
> relevant.
> One more question occurs to me -- it seems unusual for someone to be
> running on a single disk with no RAID and no backup, but to be
> running with a version of PostgreSQL with is only about a month old.
> Was 8.1.21 the version you were running at the time of the failure,
> or have you upgraded during the recovery attempt?  If you've
> upgraded, the version in use when the corruption occur
>
> -Kevin
>

Re: [ADMIN] cache lookup failed for index

Reply via email to