On Thu, Dec 6, 2012 at 9:33 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > "MauMau" <maumau...@gmail.com> writes: >> I'm using PostgreSQL 9.1.6 on Linux. I encountered a serious problem that >> media recovery failed showing the following message: >> FATAL: archive file "000000010000008000000028" has wrong size: 7340032 >> instead of 16777216 > > Well, that's unfortunate, but it's not clear that automatic recovery is > possible. The only way out of it would be if an undamaged copy of the > segment was in pg_xlog/ ... but if I recall the logic correctly, we'd > not even be trying to fetch from the archive if we had a local copy.
I'm inclined to agree with this: I've had a case much like the original poster (too-short WAL segments because of media issues), except in my case the archiver had archived a bogus copy of the data also (short length and all), so our attempt to recover from archives on a brand new system was met with obscure failure[0]. And, rather interestingly, the WAL disk was able to "write" bogusly without error for many minutes, which made for a fairly exotic recovery based on pg_resetxlog: I grabbed quite a few minutes of WAL of various sub-16MB sizes to spot check the situation. It never occurred to me there was a way to really fix this that still involves the archiver reading from a file system: what can one do when one no longer trusts read() to get what was write()d? [0]: Postgres wasn't very good about reporting the failure: in the case bogus files have been restored from archives, it seems to just bounce through timelines it knows about searching for a WAL it likes, without any real error messaging like got "corrupt wal from archive". That could probably be fixed. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers