Re: [wal-e] Wal-e Keeps Try to Fetch and Restore the Same WAL Segment

Daniel Farina Wed, 12 Feb 2014 08:26:30 -0800

On Feb 12, 2014 7:56 AM, "Kevin Harriss" <[email protected]> wrote:
>
>
>
> On Tuesday, February 11, 2014 5:43:24 PM UTC-6, Daniel Farina wrote:
>>
>> On Tue, Feb 11, 2014 at 3:39 PM, Kevin Harriss <[email protected]>
wrote:
>> >>
>> >>
>> >> The fact that WAL-E suggests it's downloading the log is troubling.
>> >>
>> >> I've seen WAL corruption manifest this way: postgres will look at the
>> >> segment, give up, but then try restoring again without so much as a
>> >> peep if memory serves.  Is postgres complaining somewhere?
>> >
>> >
>> > There aren't any postgres errors or complaints in any of the logs. It
just
>> > always says it is waiting to startup when a client tries to connect to
the
>> > slave.
>>
>> Yeah, it's stuck in crash recovery, perhaps vainly hoping to someday
escape.
>>
>> You can try wal-fetching this segment and placing it in pg_xlog, then
>> turning off archiving.  Maybe Postgres will be convinced to die and
>> tell you why, then.
>>
>> >> Sadly, the last time I figured this out it was a corruption so severe
>> >> that I downloaded the WAL to break it open and noticed it had very
>> >> much the wrong file size, as were all the WAL leading up to it before
>> >> an EBS crash.  Somehow the server continued on happily for hours
>> >> afterwards which did not make for an easy recovery (I was lucky that
>> >> there was not a double-failure and pg_resetxlog plus dump/restore was
>> >> available to me).
>> >>
>> >> It could also be a more pedestrian bug somewhere else, but if so,
it'd
>> >> be the first.
>> >>
>> >> Try a new base backup/restore and cross your fingers, and perhaps
>> >> preserve 000000010000001C000000CE and try running it through xlogdump
>> >> and submitting information to pgsql-bugs if things are amiss.
>> >
>> >
>> > Are you recommending to push a fresh backup from the master to S3 and
then
>> > do a fresh restore on the slave?
>>
>> Yes.  You probably want to do this first before digging around in the
>> old system for forensics.
>
>
> On the slave would do the following steps:
> 1 stop postgres
> 2 delete the $PG_DATA
> 3 envdir /etc/wal-e.d/env wal-e backup-fetch $PGDATA LATEST
> 4 start up postgres and watch the recovery


Yeah.

-- 
You received this message because you are subscribed to the Google Groups 
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [wal-e] Wal-e Keeps Try to Fetch and Restore the Same WAL Segment

Reply via email to