I've brought this back on-list, probably best that way..?
Eric Parusel wrote:
Tom Lane wrote:
What it kinda looks like from here is that you suffered a "page tear": the itemid pointers at the front of the page may be self-consistent, but they don't quite match the state of the rest of the page. For instance the claimed item-2 header is obviously bogus but it looks like there is a valid header starting a few bytes after where the itemid points. I suspect that the itemid pointers are one generation earlier or later than the remainder of the page. Since disks typically write in 512-byte sectors and there is nothing else in the first 512 bytes except the itemids, we could imagine that that sector got written and then the rest of the page did not. Postgres is supposed to protect against this sort of thing in case of a system crash, but I wouldn't want to swear that the protections are completely bulletproof. Have you had any power failures or system crashes lately? What sort of hardware and OS is this on?
Hmm... Here is some system information:
Dell PE1750, 2GB ECC ram, 2x73GB 10K scsi attached to Perc4/di (raid-on-motherboard, LSI megaraid chipset, battery-backed cache, write-back cache enabled), firmware/drivers is up to date as of a month ago.
The OS is RHEL3, kept up to date with the newest kernel for it.
PgSQL 8.0.1 installed from RPMs on postgresql.org, it had 8.0.0 installed from DGPG RPMs initially until 8.0.1 came out.
No power failures or crashes since it's been up...
It's been up and running with moderate to heavy load for about 2 months now.
I don't think there have been any pgsql backend (if that's the word for them) processes crashing or anything of that sort...
Pretty heavy write load on the box, it will be getting a 14 disk raid10 array plugged into it soon to speed things up.
I can't remember and I couldn't find it, but is there a consistency checking tool (pg_fsck or something?) for pgsql? Or I suppose a dump of the whole database (which I do nightly) ensures all the data is readable...
If there's anything else I can do to help figure this out, let me know..
Thanks, Eric
How would I go about double checking I don't have this problem on other pages? As above, a successful db dump would verify everything's fine?
I suppose a dump and reload after that point would verify that my indexes and anything else in base/ is fine?
How would I figure out where and how much to overwrite with dd if I was to clear this page? Or how would I set the invalid item's itemid to empty?
Obviously, stuff like this tends not to be in the documentation :D
Thanks for the help, Eric
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings