"Harvell F" <[EMAIL PROTECTED]> writes:
> Just as a follow up, it turns out that our fiberchannel RAID was power
> while the systems were up and running. There are several write errors in the
> postgresql log.
> Now I'm off to try to recover the data...
That's still a problem, it indicates either a bug in Postgres or -- sadly more
likely -- a problem with your hardware or system software setup. In a working
system Postgres guarantees that a situation like that will result in
transactions failing to commit (either with errors or freezing), not corrupted
data. Data once committed should never be lost.
In order for this to happen something in your software and hardware setup must
be caching writes then hiding the errors from Postgres. For instance systems
where fsync lies and reports success before it has written the data to disk
can result in silently corrupted data on any power outage or system crash.
Could you send the write errors? Or at least the first page or so of them?
And check the system logs at that time for any lower-level errors as well.
What kind of drives are in the fibrechannel RAID? Are they SCSI, PATA, or
SATA? Can you check their configuration at all or does the RAID hide all that
from you? Does the RAID have a battery backed cache?
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings