On Sat, Feb 22, 2014 at 1:21 PM, Torsten Förtsch <torsten.foert...@gmx.net>wrote:
> On 21/02/14 09:17, Torsten Förtsch wrote: > > one of our streaming replicas died with > > > > 2014-02-21 05:17:10 UTC PANIC: heap2_redo: unknown op code 32 > > 2014-02-21 05:17:10 UTC CONTEXT: xlog redo UNKNOWN > > 2014-02-21 05:17:11 UTC LOG: startup process (PID 1060) was terminated > > by signal 6: Aborted > > 2014-02-21 05:17:11 UTC LOG: terminating any other active server > processes > > 2014-02-21 05:17:11 UTC WARNING: terminating connection because of > > crash of another server process > > 2014-02-21 05:17:11 UTC DETAIL: The postmaster has commanded this > > server process to roll back the current transaction and exit, because > > another server process exited abnormally and possibly corrupted shared > > memory. > > 2014-02-21 05:17:11 UTC HINT: In a moment you should be able to > > reconnect to the database and repeat your command. > > Any idea what that means? > > I have got a second replica dying with the same symptoms. The Xlog record seems to be corrupted. The op code 32 represents XLOG_HEAP2_FREEZE_PAGE, the code exists to handle it. Don't know why the system is not able to recognize the op code? Can you try pg_xlogdump of the corrupted WAL file? Keep the data folder for problem investigation. As it seems some of kind corruption, you need to take a fresh base backup to continue. Regards, Hari Babu Fujitsu Australia