On Feb 12, 2014 7:56 AM, "Kevin Harriss" <[email protected]> wrote: > > > > On Tuesday, February 11, 2014 5:43:24 PM UTC-6, Daniel Farina wrote: >> >> On Tue, Feb 11, 2014 at 3:39 PM, Kevin Harriss <[email protected]> wrote: >> >> >> >> >> >> The fact that WAL-E suggests it's downloading the log is troubling. >> >> >> >> I've seen WAL corruption manifest this way: postgres will look at the >> >> segment, give up, but then try restoring again without so much as a >> >> peep if memory serves. Is postgres complaining somewhere? >> > >> > >> > There aren't any postgres errors or complaints in any of the logs. It just >> > always says it is waiting to startup when a client tries to connect to the >> > slave. >> >> Yeah, it's stuck in crash recovery, perhaps vainly hoping to someday escape. >> >> You can try wal-fetching this segment and placing it in pg_xlog, then >> turning off archiving. Maybe Postgres will be convinced to die and >> tell you why, then. >> >> >> Sadly, the last time I figured this out it was a corruption so severe >> >> that I downloaded the WAL to break it open and noticed it had very >> >> much the wrong file size, as were all the WAL leading up to it before >> >> an EBS crash. Somehow the server continued on happily for hours >> >> afterwards which did not make for an easy recovery (I was lucky that >> >> there was not a double-failure and pg_resetxlog plus dump/restore was >> >> available to me). >> >> >> >> It could also be a more pedestrian bug somewhere else, but if so, it'd >> >> be the first. >> >> >> >> Try a new base backup/restore and cross your fingers, and perhaps >> >> preserve 000000010000001C000000CE and try running it through xlogdump >> >> and submitting information to pgsql-bugs if things are amiss. >> > >> > >> > Are you recommending to push a fresh backup from the master to S3 and then >> > do a fresh restore on the slave? >> >> Yes. You probably want to do this first before digging around in the >> old system for forensics. > > > On the slave would do the following steps: > 1 stop postgres > 2 delete the $PG_DATA > 3 envdir /etc/wal-e.d/env wal-e backup-fetch $PGDATA LATEST > 4 start up postgres and watch the recovery
Yeah. -- You received this message because you are subscribed to the Google Groups "wal-e" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
