I think we can discard the 500 error for now, I just can't find it and for
all I know at this point it was a one-off.

The database is 2.5TB and it's running on an ec2 hs1.8xlarge.  It's sitting
on the ephemeral disks in a raid0.

On Thu, Jan 22, 2015 at 4:18 PM, Daniel Farina <[email protected]> wrote:

> On Thu, Jan 22, 2015 at 1:16 PM, Brian Scholl <[email protected]> wrote:
> > Reducing the backup-push pool size to 1 worked, it takes almost 18hrs but
> > the server doesn't become completely inaccessible.  I did end up
> disabling
> > tso and sg on eth0 to work around the "rides the rocket" errors.  It
> still
> > feels a little spikey when connected via SSH (delays in connecting,
> delays
> > in commands) but it's totally survivable.
> >
> > Network utilization looks pegged throughout backup-push.  I'm not sure if
> > that's expected given my configuration.  I've attached the ec2 monitoring
> > graphs for disk read, disk write, and network over the past 24 hours.
> >
> > Daniel, I think the only option I haven't tried yet is the
> > --cluster-read-rate-limit.  Do you still think that could be helpful?  If
> > so, could you provide some guidance as far as expected behavior and
> picking
> > a rate?
>
> How big is this database, and what is the 500 you see otherwise?
>
> I have used 10MiB/s for nominal databases with success, but with
> backups taking 18 hours, it would appear you have some combination of
> a large database on tiny resources.
>

-- 
You received this message because you are subscribed to the Google Groups 
"wal-e" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to