Hi Ian;
I think it is important to figure out why this is happening. I would not want to run any production databases on systems that were failing like this.
I am trying to figure out what are the likely causes of the errors...
1) Any other computers suffer random application crashes, power downs, etc. in your building?
2) I take it there are no Raid controllers involved?
3) RAM is non-ECC?
4) Are the systems on UPS's?
If I could make a wild (and probably wrong) guess, I would wonder if something external to the system (like electrical supply) was introducing glitches into memory, causing bad data to be written. I am only mentioning it because I have implicated electrical supply in other cases where rare computer failurres weer affecting many systems...
Ian Westmacott wrote:
For several weeks now we have been experiencing fairly severe database corruption upon clean reboot. It is very repeatable, and the corruption is of the following forms:
ERROR: could not access status of transaction foo DETAIL: could not open file "bar": No such file or directory
ERROR: invalid page header in block foo of relation "bar"
ERROR: uninitialized page in block foo of relation "bar"
At first, we believed this was related to XFS, and have been pursuing investigations along those lines. However, we have now experienced the exact same problem with JFS.
Here are some details:
- Postgres 7.4.2 - 2.6.6 kernel.org kernel - dedicated database partition - repeatable with XFS and JFS (have not seen on ext3) - repeatable with and without Linux software RAID 0 - repeatable with IDE and SATA - repeatable with and without fsync, and with fdatasync - repeatable on multiple systems
I have two questions:
- any known reason why this might be occurring? (we must have something wrong, for this high rate of severe error).
- if I don't care about losing data, and am not interested in trying to recover anything, how can I arrange for Postgres to proceed normally? I know about zero_damaged_pages, but this doesn't help with missing transaction files and such. Is there any way to get Postgres to chuck anything bad and proceed?
Thanks,
--Ian
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]