Thanks for the reply, Tom.  I admit we were a bit rushed during the
troubleshooting process; now that we know precisely how to identify these
procs and deal with them, I imagine we'll grab more info next time before
killing them, including a netstat view and a stack trace per your
recommendation.

Cheers,
-Chris.

On Mon, May 16, 2011 at 1:04 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Christopher Opena <counterv...@gmail.com> writes:
> > Hello all,
> > First time poster here - probably a good sign since I've been running
> > postgresql with zero issues for the last several years!  At any rate,
> I've
> > recently run into a strange issue.  Client-based EOFs are nothing new to
> our
> > application; people can sometimes close a connection for a number of
> > reasons.  However, on the DB side these have always been released with no
> > issue.
>
> > Today, we had 4 connections that saw either a client-based EOF or a
> > 'canceling statement due to user request', and in all of these 4 cases
> the
> > query remained active and started chewing heavily into the cpu such that
> > they ground the db server to a halt until each of the procpids were
> manually
> > issued a 'pg_cancel_backed ( procpid );' from the console.  None of the 4
> > queries were particularly heavy; they registered approximately a 2300ms
> > completion time using a query EXPLAIN ANALYZE.  A little long, but not
> too
> > far out of the ordinary on our usual database usage.
>
> It's always been the case that if the client drops the connection
> mid-query, the backend will notice that only when it next tries to send
> some data --- and even then, if the kernel isn't aware that the far end
> has dropped the connection, the kernel may accept and buffer quite a bit
> of data before blocking the backend from doing more work.  So if the
> query requires a lot of processing before it starts to emit data, the
> backend can do a lot of work before noticing anything is wrong.
> I suspect that these queries were expensive and you made a mistake in
> what you tested after the fact (or maybe conditions changed, eg
> autovacuum came along and updated statistics).  It's also possible that
> some network glitch affected the connections such that the server's
> kernel didn't know they were lost.  It's hard to say much more than that
> without a lot more evidence than you've provided.  If it happens again,
> you might try poking a bit harder into the situation before you kill the
> queries --- netstat's opinion of the connection statuses might be
> interesting, and so might stack traces from the busy backends.
>
>                        regards, tom lane
>

Reply via email to