Running with --pool-size of 1 or 2 produces successful backups on the nodes that were failing, whereas larger pool sizes get stuck with persistent failures, as above.
Really strange that this is happening consistently on some machines but not at all on others. Backups on the latter set work fine with a default pool size. All the nodes have the same instance type, the same configuration, the same wal-e version, and approximately the same db sizes. Unclear what's going on here. On Wednesday, June 4, 2014 4:36:52 PM UTC-7, Dan Robinson wrote: > > Ah, yes. The chunk upload failures are coming ~8 minutes apart for the > same chunk, so DNS caching issues are probably not at fault. > > I'll try a fresh backup-push with --pool-size turned down to 1. > > On Wednesday, June 4, 2014 3:44:57 PM UTC-7, Daniel Farina wrote: >> >> On Wed, Jun 4, 2014 at 3:08 PM, Dan Robinson <[email protected]> wrote: >> > Is there any other info I can fetch that might be helpful in diagnosing >> > what's going on? >> >> Unsure. It could be a load thing; perhaps the errors will decrease >> with decreased parallelism? >> >> The DNS problems I wrote about will persist for extended periods, >> causing retry counts easily in the thousands. >> > -- You received this message because you are subscribed to the Google Groups "wal-e" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
