I've brought this back on-list, probably best that way..?

Eric Parusel wrote:
Tom Lane wrote:

What it kinda looks like from here is that you suffered a "page tear":
the itemid pointers at the front of the page may be self-consistent, but
they don't quite match the state of the rest of the page.  For instance
the claimed item-2 header is obviously bogus but it looks like there is
a valid header starting a few bytes after where the itemid points.
I suspect that the itemid pointers are one generation earlier or later
than the remainder of the page.  Since disks typically write in 512-byte
sectors and there is nothing else in the first 512 bytes except the
itemids, we could imagine that that sector got written and then the rest
of the page did not.  Postgres is supposed to protect against this sort
of thing in case of a system crash, but I wouldn't want to swear that
the protections are completely bulletproof.  Have you had any power
failures or system crashes lately?  What sort of hardware and OS is this
on?


Hmm...
Here is some system information:

Dell PE1750, 2GB ECC ram, 2x73GB 10K scsi attached to Perc4/di (raid-on-motherboard, LSI megaraid chipset, battery-backed cache, write-back cache enabled), firmware/drivers is up to date as of a month ago.

The OS is RHEL3, kept up to date with the newest kernel for it.

PgSQL 8.0.1 installed from RPMs on postgresql.org, it had 8.0.0 installed from DGPG RPMs initially until 8.0.1 came out.

No power failures or crashes since it's been up...

It's been up and running with moderate to heavy load for about 2 months now.

I don't think there have been any pgsql backend (if that's the word for them) processes crashing or anything of that sort...

Pretty heavy write load on the box, it will be getting a 14 disk raid10 array plugged into it soon to speed things up.



I can't remember and I couldn't find it, but is there a consistency checking tool (pg_fsck or something?) for pgsql? Or I suppose a dump of the whole database (which I do nightly) ensures all the data is readable...

If there's anything else I can do to help figure this out, let me know..

Thanks,
Eric


How would I go about double checking I don't have this problem on other pages? As above, a successful db dump would verify everything's fine?
I suppose a dump and reload after that point would verify that my indexes and anything else in base/ is fine?


How would I figure out where and how much to overwrite with dd if I was to clear this page? Or how would I set the invalid item's itemid to empty?

Obviously, stuff like this tends not to be in the documentation :D

Thanks for the help,
Eric

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to