Hi,

We've had a harware failure at the data center, the provider said that "the
chip" burned.
Most systems came back online after a new mainbord was installed.

However, our test database cluster seems to be broken.
When i try to start the cluster it says:
2015-11-16 15:06:35 CET db: ip: us: LOG:  database system was interrupted
while in recovery at 2015-11-16 13:05:41 CET
2015-11-16 15:06:35 CET db: ip: us: HINT:  This probably means that some
data is corrupted and you will have to use the last backup for recovery.
2015-11-16 15:06:35 CET db: ip: us: LOG:  database system was not properly
shut down; automatic recovery in progress
2015-11-16 15:06:35 CET db: ip: us: PANIC:  unexpected pageaddr A/6E786000
in log segment 000000010000000A00000084, offset 7888896
2015-11-16 15:06:35 CET db: ip: us: LOG:  startup process (PID 11634) was
terminated by signal 6: Aborted
2015-11-16 15:06:35 CET db: ip: us: LOG:  aborting startup due to startup
process failure

The cluster is on an OpenVZ container and runs Ubuntu 14.04
The postgresql version is 9.3
3 other clusters on the same container are fine.
We use a hardware RAID10 of SATA disks with a BBU (and writeBack mode)

Relevant settings:
#wal_level = minimal
#fsync = on
#synchronous_commit = on
#wal_sync_method = fsync
#full_page_writes = on
#wal_buffers = -1
#wal_writer_delay = 200ms
#commit_delay = 0
#commit_siblings = 5

# - Checkpoints -

checkpoint_segments = 10
#checkpoint_timeout = 5min
#checkpoint_completion_target = 0.5

I'd like to know what can be the cause of this corruption in a bit more
detail if possible.
As far as i can see, there is a bad memory address in the WAL, is that
correct?

Cheers,

-- 
Willy-Bas Loos

Reply via email to