On Wed, Feb 5, 2014 at 9:52 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Monday, February 3, 2014, Pweaver (Paul Weaver) <pwea...@panjiva.com> > wrote: > >> We have been running into a (live lock?) issue on our production Postgres >> instance causing queries referencing a particular table to become extremely >> slow and our application to lock up. >> >> This tends to occur on a particular table that gets a lot of queries >> against it after a large number of deletes. When this happens, the >> following symptoms occur when queries referencing that table are run (even >> it we stop the deleting): >> > > What do you mean by "stop the deleting"? Are you pausing the delete but > without either committing or rolling back the transaction, but just holding > it open? Are you stopping it cleanly, between transactions? > We are repeatedly running delete commands in their own transactions. We stop issuing new deletes and let them finish cleanly. > > Also, how many queries are happening concurrently? Perhaps you need a > connection pooler. > Usually between 1 and 20. When it gets locked up closer to 100-200. We should add a connection pooler. Would the number of active queries on the table be causing the issue? > > Is the CPU time user time or system time? What kernel version do you have? > Real time - 3.2.0-26 > > >> SELECT * FROM table_name LIMIT 10; -- takes ~45 seconds to complete >> EXPLAIN SELECT * FROM table_name LIMIT 10; -- takes ~45 seconds to >> complete the explain query, the query plan looks reasonable >> > > This sounds like the problem we heard quite a bit about recently, where > processes spend a lot of time fighting over the proclock while they try to > check the commit status of open transactions while. But I don't see how > deletes could trigger that behavior. If the delete has not committed, the > tuples are still visible and the LIMIT 10 is quickly satisfied. If the > delete has committed, the tuples quickly get hinted, and so the next query > along should be faster. > > I also don't see why the explain would be slow. A similar problem was > tracked down to digging through in-doubt tuples while trying to use an > index to find the true the min or max during estimating the cost of a merge > join. But I don't think a simple table query should lead to that, unless > table_name is a view. And I don't see how deletes, rather than uncommitted > inserts, could trigger it either. > > > max_connections | 600 | >> configuration file >> > > That is quite extreme. If a temporary load spike (like from the deletes > and the hinting needed after them) slows down the select queries and you > start more and more of them, soon you could tip the system over into kernel > scheduler insanity with high system time. Once in this mode, it will stay > there until the incoming stream of queries stops and the existing ones > clear out. But, if that is what is occurring, I don't know why queries on > other tables would still be fast. > We probably want a connection pooler anyways, but in this particular case, the load average is fairly low on the machine running Postrgres. > > Cheers, > > Jeff > >> -- Thank You, Pweaver (pwea...@panjiva.com)