Re: [wal-e] Socket Timeouts When Backing Up Large Instances

Dan Robinson Thu, 05 Jun 2014 13:32:43 -0700

Running with --pool-size of 1 or 2 produces successful backups on the nodes 
that were failing, whereas larger pool sizes get stuck with persistent 
failures, as above.


Really strange that this is happening consistently on some machines but not 
at all on others. Backups on the latter set work fine with a default pool 
size. All the nodes have the same instance type, the same configuration, 
the same wal-e version, and approximately the same db sizes. Unclear what's 
going on here.

On Wednesday, June 4, 2014 4:36:52 PM UTC-7, Dan Robinson wrote:
>
> Ah, yes. The chunk upload failures are coming ~8 minutes apart for the 
> same chunk, so DNS caching issues are probably not at fault.
>
> I'll try a fresh backup-push with --pool-size turned down to 1.
>
> On Wednesday, June 4, 2014 3:44:57 PM UTC-7, Daniel Farina wrote:
>>
>> On Wed, Jun 4, 2014 at 3:08 PM, Dan Robinson <[email protected]> wrote: 
>> > Is there any other info I can fetch that might be helpful in diagnosing 
>> > what's going on? 
>>
>> Unsure.  It could be a load thing; perhaps the errors will decrease 
>> with decreased parallelism? 
>>
>> The DNS problems I wrote about will persist for extended periods, 
>> causing retry counts easily in the thousands. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [wal-e] Socket Timeouts When Backing Up Large Instances

Reply via email to