Re: [wal-e] Problem with PITR recovery

justin Mon, 18 Aug 2014 12:00:06 -0700


On Wednesday, February 5, 2014 10:31:25 AM UTC-8, Daniel Farina wrote:
>
> On Wed, Feb 5, 2014 at 3:28 AM, Dan Fairs <[email protected] <javascript:>> 
> wrote: 
>
>


> > 
> > [snip] 
> > 
> >> It therefore looks (with exactly 2 data points...) like boto could be 
> the culprit - it *seems* like it may be possible for it to corrupt files in 
> the face of connection problems. We don't have anything but circumstantial 
> evidence for this, but if it happens again, it's the first place we'll 
> look. 
> > 
> > 
> > It's also worth mentioning that both our Riak CS-based system and WAL-E 
> use boto in conjunction with gevent (the Riak system uses gevent 1.0). 
>
> That seems a bit scary.  One of my colleagues, Greg Stark, has been 
> looking into removing gevent, although not for this reason (the 
> reasons are simplicity and performance).  Perhaps as a windfall his 
> patch can be used to test your hypothesis. 
>
> Another project that has some interest is implementing checksumming 
> manifests on the upload side that could be re-checked during download, 
> which would maybe also help pin down the problem. 
>

Has any of the above been done, or is there any advancement on 
understanding this problem at all?

I'm running into this with a large ~1TB database that takes the better part 
of a day to run each of a backup-push and a backup-fetch, and has for the 
past several days not been able to catch up despite a constant processing 
of WAL files with _tons_ of these errors about transaction 0 interspersed.

If I remove the recovery_command, I can start the database and communicate 
with it, so it seems odd that this would be corruption in the backup-push 
or fetch, but I certainly haven't tried to access or manipulate all of the 
records, so it's possible.

-- 
You received this message because you are subscribed to the Google Groups 
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [wal-e] Problem with PITR recovery

Reply via email to