On Thu, Jan 30, 2014 at 1:44 AM, Dan Fairs <[email protected]> wrote:
> (Bringing back on list)
>
>>>>> Sad to say, I did not see what I was looking for -- which is a retry
>>>>> failing to process, ugly messages aside.
>>>>>
>>>>> If you can reproduce this defect from the same base backup
>>>>> in-triplicate, it's almost certainly corruption of some kind.  If not
>>>>> -- e.g. sometimes it works -- it could be a WAL-E bug on the fetch side.
>>>>>
>>>>> I'd be very grateful if you'd give it a try.  I haven't been able to
>>>>> produce this defect on restoring a base backup myself.
>>>>>
>>>>
>>>> I've got a restore running currently from a new base backup which just
>>>> finished. I'll let that finish, and fully recover (hopefully!) just to
>>>> satisfy myself that this setup is basically working; after that, I'll try
>>>> again from the failing base backup. It takes quite a while to do all this,
>>>> so don't worry if you don't hear from me for a short while!
>>>
>>> I'm grateful to hear anything at any time.  WAL-E has grown up into
>>> long-haul software -- it'll still be here if you can find the time later.
>>>
>>> Well - unfortunately my second attempt with a newer base backup also failed.
>>> This is a bit of a concern now - I'd like to dig deeper into this. Should we
>>> take this back on the list?
>>
>> Sure.  Pity to say, the more times this fails, particularly with fresh
>> base backups, the more likely it seems to me you've been hit by
>> corruption.  WAL-E has a bit too much empirical reliability to be
>> easily implicated in successive defects on upload or download sides.
>
>
> Right, that's what I was afraid of. I'm currently restoring from a pg_dump 
> just to check that we can recover. I suspect the next step will be to take 
> another server, restore a pg_dump'd backup on it, and try a WAL-E setup on 
> that one. If that works, then I expect we'll have to dump and reload our 
> production server. Frustrating, as this all worked smoothly in our test 
> environments! That's life, I guess...

Yeah.  Testing backups is still a struggle -- even superficially
starting up the cluster is not enough.  Some extra checking or
monitoring integration will probably be seen in WAL-E over time,
particularly with regard to Postgres checksums and figuring out how to
deal with file system failures for those using checksummed file
systems, but that is a ways off.

-- 
You received this message because you are subscribed to the Google Groups 
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to