Re: [PERFORM] Postgres Query Plan Live Lock

2014-02-05 Thread Claudio Freire
On Wed, Feb 5, 2014 at 4:47 PM, Pweaver (Paul Weaver)
 wrote:
>> That is quite extreme.  If a temporary load spike (like from the deletes
>> and the hinting needed after them) slows down the select queries and you
>> start more and more of them, soon you could tip the system over into kernel
>> scheduler insanity with high system time.  Once in this mode, it will stay
>> there until the incoming stream of queries stops and the existing ones clear
>> out.  But, if that is what is occurring, I don't know why queries on other
>> tables would still be fast.
>
> We probably want a connection pooler anyways, but in this particular case,
> the load average is fairly low on the machine running Postrgres.


Indeed, if lack of connection pooling was the cause, I'd expect a huge
load average (around 100).

Can you post the output of "vmstat 6 10" and "iostat -x -m -d 6 10"
while the server is overloaded? (try to run them at the same time so
results can be correlated).

Also, some details on the hardware wouldn't hurt, like amount of RAM,
number of processors, kind of processor, whether it's a virtual
machine or a bare metal one, number of disks and disk configuration,
etc...


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Query Plan Live Lock

2014-02-05 Thread Pweaver (Paul Weaver)
On Wed, Feb 5, 2014 at 9:52 AM, Jeff Janes  wrote:

> On Monday, February 3, 2014, Pweaver (Paul Weaver) 
> wrote:
>
>> We have been running into a (live lock?) issue on our production Postgres
>> instance causing queries referencing a particular table to become extremely
>> slow and our application to lock up.
>>
>>  This tends to occur on a particular table that gets a lot of queries
>> against it after a large number of deletes. When this happens, the
>> following symptoms occur when queries referencing that table are run (even
>> it we stop the deleting):
>>
>
> What do you mean by "stop the deleting"?  Are you pausing the delete but
> without either committing or rolling back the transaction, but just holding
> it open?  Are you stopping it cleanly, between transactions?
>

We are repeatedly running delete commands in their own transactions. We
stop issuing new deletes and let them finish cleanly.

>
> Also, how many queries are happening concurrently?  Perhaps you need a
> connection pooler.
>
Usually between 1 and 20. When it gets locked up closer to 100-200.
We should add a connection pooler. Would the number of active queries on
the table be causing the issue?

>
> Is the CPU time user time or system time?  What kernel version do you have?
>
Real time - 3.2.0-26

>
>
>> SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to complete
>> EXPLAIN SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to
>> complete the explain query, the query plan looks reasonable
>>
>
> This sounds like the problem we heard quite a bit about recently, where
> processes spend a lot of time fighting over the proclock while they try to
> check the commit status of open transactions while.  But I don't see how
> deletes could trigger that behavior.  If the delete has not committed, the
> tuples are still visible and the LIMIT 10 is quickly satisfied.  If the
> delete has committed, the tuples quickly get hinted, and so the next query
> along should be faster.
>
> I also don't see why the explain would be slow.  A similar problem was
> tracked down to digging through in-doubt tuples while trying to use an
> index to find the true the min or max during estimating the cost of a merge
> join.  But I don't think a simple table query should lead to that, unless
> table_name is a view.  And I don't see how deletes, rather than uncommitted
> inserts, could trigger it either.
>
>
>  max_connections  | 600  |
>> configuration file
>>
>
> That is quite extreme.  If a temporary load spike (like from the deletes
> and the hinting needed after them) slows down the select queries and you
> start more and more of them, soon you could tip the system over into kernel
> scheduler insanity with high system time.  Once in this mode, it will stay
> there until the incoming stream of queries stops and the existing ones
> clear out.  But, if that is what is occurring, I don't know why queries on
> other tables would still be fast.
>
We probably want a connection pooler anyways, but in this particular case,
the load average is fairly low on the machine running Postrgres.

>
> Cheers,
>
> Jeff
>
>>


-- 
Thank You,
Pweaver (pwea...@panjiva.com)


Re: [PERFORM] Postgres Query Plan Live Lock

2014-02-05 Thread Pweaver (Paul Weaver)
On Tue, Feb 4, 2014 at 9:03 PM, Peter Geoghegan  wrote:

> On Mon, Feb 3, 2014 at 1:35 PM, Pweaver (Paul Weaver)
>  wrote:
> > We have been running into a (live lock?) issue on our production Postgres
> > instance causing queries referencing a particular table to become
> extremely
> > slow and our application to lock up.
>
> Livelock? Really? That would imply that the query would never finish.
> A livelock is morally equivalent to an undetected deadlock.
>
Livelock is bad term.


> > This tends to occur on a particular table that gets a lot of queries
> against
> > it after a large number of deletes. When this happens, the following
> > symptoms occur when queries referencing that table are run (even it we
> stop
> > the deleting):
> >
> > SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to complete
> > EXPLAIN SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to
> complete
> > the explain query, the query plan looks reasonable
> > EXPLAIN SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to
> complete
> > the explain analyze query, query plan looks reasonable, timing stats says
> > query took sub millisecond time to complete
>
> Why should explain analyze say that? You'd need to catch the problem
> as it is run.
>
> > SELECT * FROM another_table LIMIT 10; -- takes sub millisecond time
> > EXPLAIN * FROM another_table LIMIT 10; -- takes sub millisecond time,
> query
> > plan looks reasonable
> >
> > This behavior only stops and the queries go back to taking sub
> millisecond
> > time if we take the application issuing the SELECTs offline and wait for
> the
> > active queries to finish (or terminate them).
> >
> > There is not a particularly large load on the database machine at the
> time,
> > neither are there a particularly large number of wal logs being written
> > (although there is a burst of wal log writes immediately after the queue
> is
> > cleared).
>
> Are you aware of hint bits?
>
> https://wiki.postgresql.org/wiki/Hint_Bits

No, but why would this cause the EXPLAIN queries to be slow?

>
>
> --
> Regards,
> Peter Geoghegan
>



-- 
Thank You,
Pweaver (pwea...@panjiva.com)


Re: [PERFORM] Postgres Query Plan Live Lock

2014-02-05 Thread bricklen
On Mon, Feb 3, 2014 at 1:35 PM, Pweaver (Paul Weaver)
wrote:

>
> table_name stats:
> ~ 400,000,000 rows
> We are deleting 10,000,000s of rows in 100,000 row increments over a few
> days time prior/during this slowdown.
>

If you issue "VACUUM" or "VACUUM ANALYZE" after each DELETE, do the SELECTs
become more responsive?


Re: [PERFORM] increasing query time after analyze

2014-02-05 Thread Claudio Freire
On Wed, Feb 5, 2014 at 8:50 AM, Katharina Koobs
 wrote:
> Hi,
>
>
>
> We have a PostgreSQL DB, version 9.3 on a Suse Linux system.
>
> We ran the update from postgresql 8.4 to 9.3.
>
> After importing the database the query time of one sql query is about 30
> sec.
>
> After ANALYZE the DB the query time of this sql query is about 45 minutes.
>
> We can see that after analyzing the indexes will no longer be used.
>
>
>
> Has anyone an idea why ANALYZE cause this problem?


If you just restored a dump, you should do a VACUUM ANALYZE.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Postgres Query Plan Live Lock

2014-02-05 Thread Jeff Janes
On Monday, February 3, 2014, Pweaver (Paul Weaver) 
wrote:

> We have been running into a (live lock?) issue on our production Postgres
> instance causing queries referencing a particular table to become extremely
> slow and our application to lock up.
>
>  This tends to occur on a particular table that gets a lot of queries
> against it after a large number of deletes. When this happens, the
> following symptoms occur when queries referencing that table are run (even
> it we stop the deleting):
>

What do you mean by "stop the deleting"?  Are you pausing the delete but
without either committing or rolling back the transaction, but just holding
it open?  Are you stopping it cleanly, between transactions?

Also, how many queries are happening concurrently?  Perhaps you need a
connection pooler.

Is the CPU time user time or system time?  What kernel version do you have?


> SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to complete
> EXPLAIN SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to
> complete the explain query, the query plan looks reasonable
>

This sounds like the problem we heard quite a bit about recently, where
processes spend a lot of time fighting over the proclock while they try to
check the commit status of open transactions while.  But I don't see how
deletes could trigger that behavior.  If the delete has not committed, the
tuples are still visible and the LIMIT 10 is quickly satisfied.  If the
delete has committed, the tuples quickly get hinted, and so the next query
along should be faster.

I also don't see why the explain would be slow.  A similar problem was
tracked down to digging through in-doubt tuples while trying to use an
index to find the true the min or max during estimating the cost of a merge
join.  But I don't think a simple table query should lead to that, unless
table_name is a view.  And I don't see how deletes, rather than uncommitted
inserts, could trigger it either.


 max_connections  | 600  |
> configuration file
>

That is quite extreme.  If a temporary load spike (like from the deletes
and the hinting needed after them) slows down the select queries and you
start more and more of them, soon you could tip the system over into kernel
scheduler insanity with high system time.  Once in this mode, it will stay
there until the incoming stream of queries stops and the existing ones
clear out.  But, if that is what is occurring, I don't know why queries on
other tables would still be fast.

Cheers,

Jeff

>


Re: [PERFORM] increasing query time after analyze

2014-02-05 Thread Pavel Stehule
Hello


2014-02-05 Katharina Koobs :

> Hi,
>
>
>
> We have a PostgreSQL DB, version 9.3 on a Suse Linux system.
>
> We ran the update from postgresql 8.4 to 9.3.
>
> After importing the database the query time of one sql query is about 30
> sec.
>
> After ANALYZE the DB the query time of this sql query is about 45 minutes.
>
> We can see that after analyzing the indexes will no longer be used.
>
>
>
> Has anyone an idea why ANALYZE cause this problem?
>

yes, it is possible - sometimes due more reasons (some strange dataset or
correlation between columns) a statistics estimations are totally out. And
bad musty statistics can produces better estimations than fresh statistics

please send a "EXPLAIN ANALYZE" output for fast and slow queries.

Regards

Pavel Stehule


>
>
> Thanks a lot for your help!
>
>
>
> Katharina
>
>
>
>
>


[PERFORM] increasing query time after analyze

2014-02-05 Thread Katharina Koobs
Hi,
 
We have a PostgreSQL DB, version 9.3 on a Suse Linux system.
We ran the update from postgresql 8.4 to 9.3.
After importing the database the query time of one sql query is about 30
sec.
After ANALYZE the DB the query time of this sql query is about 45 minutes.
We can see that after analyzing the indexes will no longer be used.
 
Has anyone an idea why ANALYZE cause this problem?
 
Thanks a lot for your help!
 
Katharina