On Wed, Feb 5, 2014 at 9:52 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Monday, February 3, 2014, Pweaver (Paul Weaver) <pwea...@panjiva.com>
> wrote:
>
>> We have been running into a (live lock?) issue on our production Postgres
>> instance causing queries referencing a particular table to become extremely
>> slow and our application to lock up.
>>
>>  This tends to occur on a particular table that gets a lot of queries
>> against it after a large number of deletes. When this happens, the
>> following symptoms occur when queries referencing that table are run (even
>> it we stop the deleting):
>>
>
> What do you mean by "stop the deleting"?  Are you pausing the delete but
> without either committing or rolling back the transaction, but just holding
> it open?  Are you stopping it cleanly, between transactions?
>

We are repeatedly running delete commands in their own transactions. We
stop issuing new deletes and let them finish cleanly.

>
> Also, how many queries are happening concurrently?  Perhaps you need a
> connection pooler.
>
Usually between 1 and 20. When it gets locked up closer to 100-200.
We should add a connection pooler. Would the number of active queries on
the table be causing the issue?

>
> Is the CPU time user time or system time?  What kernel version do you have?
>
Real time - 3.2.0-26

>
>
>> SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to complete
>> EXPLAIN SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to
>> complete the explain query, the query plan looks reasonable
>>
>
> This sounds like the problem we heard quite a bit about recently, where
> processes spend a lot of time fighting over the proclock while they try to
> check the commit status of open transactions while.  But I don't see how
> deletes could trigger that behavior.  If the delete has not committed, the
> tuples are still visible and the LIMIT 10 is quickly satisfied.  If the
> delete has committed, the tuples quickly get hinted, and so the next query
> along should be faster.
>
> I also don't see why the explain would be slow.  A similar problem was
> tracked down to digging through in-doubt tuples while trying to use an
> index to find the true the min or max during estimating the cost of a merge
> join.  But I don't think a simple table query should lead to that, unless
> table_name is a view.  And I don't see how deletes, rather than uncommitted
> inserts, could trigger it either.
>
>
>  max_connections              | 600                                      |
>> configuration file
>>
>
> That is quite extreme.  If a temporary load spike (like from the deletes
> and the hinting needed after them) slows down the select queries and you
> start more and more of them, soon you could tip the system over into kernel
> scheduler insanity with high system time.  Once in this mode, it will stay
> there until the incoming stream of queries stops and the existing ones
> clear out.  But, if that is what is occurring, I don't know why queries on
> other tables would still be fast.
>
We probably want a connection pooler anyways, but in this particular case,
the load average is fairly low on the machine running Postrgres.

>
> Cheers,
>
> Jeff
>
>>


-- 
Thank You,
Pweaver (pwea...@panjiva.com)

Reply via email to