On Wed, Feb 5, 2014 at 3:28 AM, Dan Fairs <[email protected]> wrote:
>>>>
>>>> Right, that's what I was afraid of. I'm currently restoring from a pg_dump 
>>>> just to check that we can recover. I suspect the next step will be to take 
>>>> another server, restore a pg_dump'd backup on it, and try a WAL-E setup on 
>>>> that one. If that works, then I expect we'll have to dump and reload our 
>>>> production server. Frustrating, as this all worked smoothly in our test 
>>>> environments! That's life, I guess...
>>>
>>> Yeah.  Testing backups is still a struggle -- even superficially
>>> starting up the cluster is not enough.  Some extra checking or
>>> monitoring integration will probably be seen in WAL-E over time,
>>> particularly with regard to Postgres checksums and figuring out how to
>>> deal with file system failures for those using checksummed file
>>> systems, but that is a ways off.
>
> [snip]
>
>> It therefore looks (with exactly 2 data points...) like boto could be the 
>> culprit - it *seems* like it may be possible for it to corrupt files in the 
>> face of connection problems. We don't have anything but circumstantial 
>> evidence for this, but if it happens again, it's the first place we'll look.
>
>
> It's also worth mentioning that both our Riak CS-based system and WAL-E use 
> boto in conjunction with gevent (the Riak system uses gevent 1.0).

That seems a bit scary.  One of my colleagues, Greg Stark, has been
looking into removing gevent, although not for this reason (the
reasons are simplicity and performance).  Perhaps as a windfall his
patch can be used to test your hypothesis.

Another project that has some interest is implementing checksumming
manifests on the upload side that could be re-checked during download,
which would maybe also help pin down the problem.

-- 
You received this message because you are subscribed to the Google Groups 
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to