On Fri, Oct 7, 2011 at 12:36 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> Sean Laurent <s...@studyblue.com> writes:
> > We've been running into a particularly strange problem that I'm trying to
> > better understand. The super short version is that our application servers
> > lose their connection to the database when I run a backup during periods of
> > higher load and fail to reconnect.
>
> That's just weird.  It sounds like the "xfs_freeze" operation, or the
> snapshotting operation, is somehow interrupting network traffic.  I'd
> not expect such a thing on a normal server, but who knows what's
> connected to what in an Amazon EC2 instance?
>
> Anyway, I'd suggest trying to instrument something to prove or disprove
> that there's a networking failure involved.  It might be as simple as
> watching "ping" behavior ...

Agreed that's it very weird. EBS volumes are effectively networked
attached storage, so blaming network connectivity was my first
inclination as well. Unfortunately, it's definitely not a network
failure:

- AWS support team has not detected any network outages affecting the
EC2 instance or the EBS volumes at any time remotely near when our
outages occurred.
- I can consistently ping the database instance from the application
servers while the problem is occurring.
- I can SSH into the database instance and access Postgres while the
problem is occurring.

--
Sean Laurent
Director of Operations
StudyBlue, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to