I think we can discard the 500 error for now, I just can't find it and for all I know at this point it was a one-off.
The database is 2.5TB and it's running on an ec2 hs1.8xlarge. It's sitting on the ephemeral disks in a raid0. On Thu, Jan 22, 2015 at 4:18 PM, Daniel Farina <[email protected]> wrote: > On Thu, Jan 22, 2015 at 1:16 PM, Brian Scholl <[email protected]> wrote: > > Reducing the backup-push pool size to 1 worked, it takes almost 18hrs but > > the server doesn't become completely inaccessible. I did end up > disabling > > tso and sg on eth0 to work around the "rides the rocket" errors. It > still > > feels a little spikey when connected via SSH (delays in connecting, > delays > > in commands) but it's totally survivable. > > > > Network utilization looks pegged throughout backup-push. I'm not sure if > > that's expected given my configuration. I've attached the ec2 monitoring > > graphs for disk read, disk write, and network over the past 24 hours. > > > > Daniel, I think the only option I haven't tried yet is the > > --cluster-read-rate-limit. Do you still think that could be helpful? If > > so, could you provide some guidance as far as expected behavior and > picking > > a rate? > > How big is this database, and what is the 500 you see otherwise? > > I have used 10MiB/s for nominal databases with success, but with > backups taking 18 hours, it would appear you have some combination of > a large database on tiny resources. > -- You received this message because you are subscribed to the Google Groups "wal-e" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
