>>> >>> Right, that's what I was afraid of. I'm currently restoring from a pg_dump >>> just to check that we can recover. I suspect the next step will be to take >>> another server, restore a pg_dump'd backup on it, and try a WAL-E setup on >>> that one. If that works, then I expect we'll have to dump and reload our >>> production server. Frustrating, as this all worked smoothly in our test >>> environments! That's life, I guess... >> >> Yeah. Testing backups is still a struggle -- even superficially >> starting up the cluster is not enough. Some extra checking or >> monitoring integration will probably be seen in WAL-E over time, >> particularly with regard to Postgres checksums and figuring out how to >> deal with file system failures for those using checksummed file >> systems, but that is a ways off.
[snip] > It therefore looks (with exactly 2 data points...) like boto could be the > culprit - it *seems* like it may be possible for it to corrupt files in the > face of connection problems. We don't have anything but circumstantial > evidence for this, but if it happens again, it's the first place we'll look. It's also worth mentioning that both our Riak CS-based system and WAL-E use boto in conjunction with gevent (the Riak system uses gevent 1.0). Cheers, Dan -- Dan Fairs | [email protected] | @danfairs | secondsync.com -- You received this message because you are subscribed to the Google Groups "wal-e" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
