Re: Please help! Query jumps from 1s -> 4m

2020-05-13 Thread James Thompson
Just to follow up on this...
Tried increasing stats targets last week + re-analyzing but the query was
just as bad.
Ended up increasing the prepareThreshold to prevent server-side prepares
for now (and thus later generic statements). This 'fixed' the issue and had
no noticeable negative effect for our workloads.

I still don't understand why the plan being off makes the query so much
slower in this case (the plans I shared in the last email don't look too
different, I don't understand how the filter can add on 2mins of execution
time to an index-only scan). If anyone does have thoughts on what could be
happening I would be very interested to hear, but the main performance
problem is effectively solved.

Thanks all for the valuable help getting to the bottom of what was
happening.

On Tue, 5 May 2020 at 22:42, Tom Lane  wrote:

> James Thompson  writes:
> > The slowness occurs when the prepared statement changes to a generic
> plan.
>
> > Initial plan:
> > ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> table1
> > table1alias2  (cost=0.56..2549.70 rows=70 width=36) (actual
> > time=1.901..45.256 rows=65000 loops=1)
> > Output: table1alias2.uniqueid
> > Index Cond: ((table1alias2.col20 = '12345'::bigint) AND
> (table1alias2.
> > col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,...
> > 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
> > Heap Fetches: 10
> > Buffers: shared hit=5048
>
> > after 5 executions of the statement:
> > ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> table1
> > table1alias2  (cost=0.56..17.23 rows=1 width=36) (actual
> > time=125.344..126877.822 rows=65000 loops=1)
> > Output: table1alias2.uniqueid
> > Index Cond: (table1alias2.col20 = $1001)
> > Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,
> > $1000])::text[]))
> > Rows Removed by Filter: 2670023
> > Heap Fetches: 428
> > Buffers: shared hit=45933 read=42060 dirtied=4
>
> Yeah, this is a dynamic we've seen before.  The rowcount estimate, and
> hence the cost estimate, for the plan with explicit parameter values is
> way off; but the estimate for the generic plan is even more way off,
> causing the system to falsely decide that the latter is cheaper.
>
> I've speculated about refusing to believe generic cost estimates if they
> are
> more than epsilon less than the concrete cost estimate, but it's not quite
> clear how that should work or whether it'd have its own failure modes.
>
> The one thing that is totally clear is that these rowcount estimates are
> crappy.  Can you improve them by increasing the stats target for that
> table?  Maybe with less-garbage-y inputs, the system would make the right
> plan choice here.
>
> regards, tom lane
>


Re: Please help! Query jumps from 1s -> 4m

2020-05-05 Thread Tom Lane
James Thompson  writes:
> The slowness occurs when the prepared statement changes to a generic plan.

> Initial plan:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias2  (cost=0.56..2549.70 rows=70 width=36) (actual
> time=1.901..45.256 rows=65000 loops=1)
> Output: table1alias2.uniqueid
> Index Cond: ((table1alias2.col20 = '12345'::bigint) AND (table1alias2.
> col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,...
> 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
> Heap Fetches: 10
> Buffers: shared hit=5048

> after 5 executions of the statement:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias2  (cost=0.56..17.23 rows=1 width=36) (actual
> time=125.344..126877.822 rows=65000 loops=1)
> Output: table1alias2.uniqueid
> Index Cond: (table1alias2.col20 = $1001)
> Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,
> $1000])::text[]))
> Rows Removed by Filter: 2670023
> Heap Fetches: 428
> Buffers: shared hit=45933 read=42060 dirtied=4

Yeah, this is a dynamic we've seen before.  The rowcount estimate, and
hence the cost estimate, for the plan with explicit parameter values is
way off; but the estimate for the generic plan is even more way off,
causing the system to falsely decide that the latter is cheaper.

I've speculated about refusing to believe generic cost estimates if they are
more than epsilon less than the concrete cost estimate, but it's not quite
clear how that should work or whether it'd have its own failure modes.

The one thing that is totally clear is that these rowcount estimates are
crappy.  Can you improve them by increasing the stats target for that
table?  Maybe with less-garbage-y inputs, the system would make the right
plan choice here.

regards, tom lane




Re: Please help! Query jumps from 1s -> 4m

2020-05-05 Thread Justin Pryzby
On Tue, May 05, 2020 at 10:10:18PM +0100, James Thompson wrote:
> I've managed to replicate this now with prepared statements. Thanks for all
> the guidance so far.
> 
> The slowness occurs when the prepared statement changes to a generic plan.
> 
> Initial plan:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 
> table1alias2  (cost=0.56..2549.70 rows=70 width=36) (actual 
> time=1.901..45.256 rows=65000 loops=1)
> Output: table1alias2.uniqueid
> Index Cond: ((table1alias2.col20 = '12345'::bigint) AND 
> (table1alias2.col8 = ANY 
> ('{c5986b02-3a02-4639-8147-f286972413ba,...98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))

The rowcount is off by a factor of 1000x.

> after 5 executions of the statement:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 
> table1alias2  (cost=0.56..17.23 rows=1 width=36) (actual 
> time=125.344..126877.822 rows=65000 loops=1)
> Output: table1alias2.uniqueid
> Index Cond: (table1alias2.col20 = $1001)
> Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,$1000])::text[]))
> Rows Removed by Filter: 2670023

And the generic plan is cheaper than the previous, custom plan; but slower,
undoubtedly due to rowcount mis-estimate.

> The second plan looks worse to me as it's applying a filter rather than
> using an index condition? I don't understand why it's not part of the
> condition and also why this is so much slower though.
> If I force a retrieval of all index rows for col20 = '12345' using an
> ad-hoc query (below, which in my mind is what the 'bad' plan is doing),
> that only takes 2s (2.7 mil rows). Where's the difference?
> 
> EXPLAIN (ANALYZE, BUFFERS, TIMING) SELECT COUNT(DISTINCT id) FROM table1
> WHERE datatype='TypeA' AND col20 = 12345;
> -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> (cost=0.56..2762.95 rows=55337 width=8) (actual time=3.976..1655.645 
> rows=2735023 loops=1)

I see you're querying on datetype, which I think isn't in the original query
(but I'm not sure if it's just renamed?).

Underestimate usually means that the conditions are redundant or correlated.
You could mitigate that either by creating an index on both columns (or add
datatype to the existing index), or CREATE STATISTICS ndistinct on those
columns.  Or maybe you just need to increase the stats target for col20 (?).
Then re-analyze.

-- 
Justin




Re: Please help! Query jumps from 1s -> 4m

2020-05-05 Thread James Thompson
I've managed to replicate this now with prepared statements. Thanks for all
the guidance so far.

The slowness occurs when the prepared statement changes to a generic plan.

Initial plan:
->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
table1alias2  (cost=0.56..2549.70 rows=70 width=36) (actual
time=1.901..45.256 rows=65000 loops=1)
Output: table1alias2.uniqueid
Index Cond: ((table1alias2.col20 = '12345'::bigint) AND (table1alias2.
col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,...
98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
Heap Fetches: 10
Buffers: shared hit=5048

after 5 executions of the statement:
->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
table1alias2  (cost=0.56..17.23 rows=1 width=36) (actual
time=125.344..126877.822 rows=65000 loops=1)
Output: table1alias2.uniqueid
Index Cond: (table1alias2.col20 = $1001)
Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,
$1000])::text[]))
Rows Removed by Filter: 2670023
Heap Fetches: 428
Buffers: shared hit=45933 read=42060 dirtied=4

The second plan looks worse to me as it's applying a filter rather than
using an index condition? I don't understand why it's not part of the
condition and also why this is so much slower though.
If I force a retrieval of all index rows for col20 = '12345' using an
ad-hoc query (below, which in my mind is what the 'bad' plan is doing),
that only takes 2s (2.7 mil rows). Where's the difference?

EXPLAIN (ANALYZE, BUFFERS, TIMING) SELECT COUNT(DISTINCT id) FROM table1
WHERE datatype='TypeA' AND col20 = 12345;
-> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
(cost=0.56..2762.95 rows=55337 width=8) (actual time=3.976..1655.645
rows=2735023 loops=1)
 Index Cond: (col20 = 12345)
 Heap Fetches: 417
 Buffers: shared hit=43843 read=44147 dirtied=8

>You could do: SET track_io_timing TO on;
I've not tried this yet, and haven't used it before - sounds like there's
some risks associated with running it on a production server / clocks going
backwards?

>Also, you should be able to check if that's the problem by doing either:
>plan_cache_mode = force_generic_plan;
>Or (I would think) DISCARD PLANS;
I think plan_cache_mode is just for pg12+ unfortunately? We're on 11
currently.
Just tested DISCARD PLANS locally, it didn't switch back from the generic
plan. Was that your expectation?


Re: Please help! Query jumps from 1s -> 4m

2020-05-05 Thread Justin Pryzby
On Mon, May 04, 2020 at 02:12:01PM -0500, Justin Pryzby wrote:
> On Mon, May 04, 2020 at 08:07:07PM +0100, Jamie Thompson wrote:
> > Additionally, the execution plans for the 10th + following queries look
> > fine, they have the same structure as if I run the query manually. It's not
> > that the query plan switches, it seems as though the same query plan is
> > just > 200X slower than usual.
> 
> Are you able to reproduce the problem manually ?
> 
> With/without PREPARE ?
> https://www.postgresql.org/docs/current/sql-prepare.html

Also, you should be able to check if that's the problem by doing either:
plan_cache_mode = force_generic_plan;
Or (I would think) DISCARD PLANS;

https://www.postgresql.org/docs/12/runtime-config-query.html

-- 
Justin




Re: Please help! Query jumps from 1s -> 4m

2020-05-05 Thread Laurenz Albe
On Mon, 2020-05-04 at 20:12 +0100, James Thompson wrote:
> The change is abrupt, on the 10th execution (but I hadn't spotted it was 
> always after the
> same number of executions until your suggestion - thanks for pointing me in 
> that direction).
> 
> I don't see any custom configuration on our end that changes the threshold 
> for this from 5->10.
> Debugging the query call I also see that PgConnection has the 
> prepareThreshold set to 5.
> 
> Additionally, the execution plans for the 10th + following queries look fine, 
> they have the
> same structure as if I run the query manually. It's not that the query plan 
> switches,
> it seems as though the same query plan is just > 200X slower than usual.
> 
> As for the heap fetches -> as far as I can tell, on both occasions the 
> fetches are relatively
> low and shouldn't account for minutes of execution (even if one is lower than 
> the other).
> Looking through one days logs I do find cases with lower heap fetches too, 
> for example as
> below which has 1977 fetches instead of the previous 6940 but took approx the 
> same time:
> ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1 
> table1alias1  (cost=0.56..17.25 rows=1 width=60) (actual 
> time=56.858..120893.874 rows=67000 loops=1)
> Index Cond: (col20 = $2005)
> Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY 
> ((ARRAY[$1004, ..., $2003])::text[])))
> Rows Removed by Filter: 2662793
> Heap Fetches: 1977
> Buffers: shared hit=84574 read=3522
> 
> Would you agree the statement threshold / heap fetches seems unlikely to be 
> causing this? Any other thoughts?

It does sound suspiciously like custom plans vs. generic plan.

If you are using JDBC, then the cut-off of 10 would make sense:
the JDBC driver uses (server) prepared statements only after the
fifth execution, and the prepared statement will use a generic plan
only after the fifth execution.

It would be good to see the execution plan from the third, seventh
and thirteenth execution.  You could use "auto_explain" for that.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Please help! Query jumps from 1s -> 4m

2020-05-04 Thread James Thompson
The change is abrupt, on the 10th execution (but I hadn't spotted it was
always after the same number of executions until your suggestion - thanks
for pointing me in that direction).

I don't see any custom configuration on our end that changes the threshold
for this from 5->10. Debugging the query call I also see that PgConnection
has the prepareThreshold set to 5.

Additionally, the execution plans for the 10th + following queries look
fine, they have the same structure as if I run the query manually. It's not
that the query plan switches, it seems as though the same query plan is
just > 200X slower than usual.

As for the heap fetches -> as far as I can tell, on both occasions the
fetches are relatively low and shouldn't account for minutes of execution
(even if one is lower than the other). Looking through one days logs I do
find cases with lower heap fetches too, for example as below which has 1977
fetches instead of the previous 6940 but took approx the same time:
->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
table1alias1  (cost=0.56..17.25 rows=1 width=60) (actual
time=56.858..120893.874 rows=67000 loops=1)
Index Cond: (col20 = $2005)
Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY
((ARRAY[$1004, ..., $2003])::text[])))
Rows Removed by Filter: 2662793
Heap Fetches: 1977
Buffers: shared hit=84574 read=3522

Would you agree the statement threshold / heap fetches seems unlikely to be
causing this? Any other thoughts?

Thanks!

On Sun, 3 May 2020 at 16:38, Justin Pryzby  wrote:

> On Sun, May 03, 2020 at 09:58:27AM +0100, James Thompson wrote:
> > Hi,
> >
> > Hoping someone can help with this performance issue that's been driving a
> > few of us crazy :-) Any guidance greatly appreciated.
> >
> > A description of what you are trying to achieve and what results you
> > expect.:
> >  - I'd like to get an understanding of why the following query (presented
> > in full, but there are specific parts that are confusing me) starts off
> > taking ~second in duration but 'switches' to taking over 4 minutes.
>
> Does it "switch" abruptly or do you get progressively slower queries ?
> If it's abrupt following the 5th execution, I guess you're hitting this:
>
>
> https://www.postgresql.org/message-id/pine.bso.4.64.0802131404090.6...@leary.csoft.net
>
> https://www.postgresql.org/message-id/a737b7a37273e048b164557adef4a58b50fb8...@ntex2010i.host.magwien.gv.at
>
> >  - we initially saw this behaviour for the exact same sql with a
> different
> > index that resulted in an index scan. To try and fix the issue we've
> > created an additional index with additional included fields so we now
> have
> > Index Only Scans, but are still seeing the same problem.
>
> >  Segments of interest:
> >  1. ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> > table1 table1alias1  (cost=0.56..17.25 rows=1 width=60) (actual
> > time=110.539..123828.134 rows=67000 loops=1)
> > Index Cond: (col20 = $2005)
> > Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text =
> ANY
> > ((ARRAY[$1004, ..., $2003])::text[])))
> > Rows Removed by Filter: 2662652
> > Heap Fetches: 6940
> > Buffers: shared hit=46619 read=42784 written=52
>
> > If I run the same queries now:
> > Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> > table1alias1  (cost=0.56..2549.69 rows=69 width=36)
> > (actual time=1.017..1221.375 rows=67000 loops=1)
> > Heap Fetches: 24
> > Buffers: shared hit=2849 read=2483
>
> It looks to me like you're getting good performance following a vacuum,
> when
> Heap Fetches is low.  So you'd want to run vacuum more often, like:
> | ALTER TABLE table1 SET (autovacuum_vacuum_scale_factor=0.005).
>
> But maybe I've missed something - you showed the bad query plan, but not
> the
> good one, and I wonder if they may be subtly different, and that's maybe
> masked
> by the replaced identifiers.
>
> --
> Justin
>


Re: Please help! Query jumps from 1s -> 4m

2020-05-04 Thread David Rowley
On Mon, 4 May 2020 at 02:35, James Thompson  wrote:
> buffers do look different - but still, reading 42k doesn't seem like it would 
> cause a delay of 4m?

You could do: SET track_io_timing TO on;

then: EXPLAIN (ANALYZE, BUFFERS) your query and see if the time is
spent doing IO.

David




Re: Please help! Query jumps from 1s -> 4m

2020-05-04 Thread Justin Pryzby
On Mon, May 04, 2020 at 08:07:07PM +0100, Jamie Thompson wrote:
> Additionally, the execution plans for the 10th + following queries look
> fine, they have the same structure as if I run the query manually. It's not
> that the query plan switches, it seems as though the same query plan is
> just > 200X slower than usual.

Are you able to reproduce the problem manually ?

With/without PREPARE ?
https://www.postgresql.org/docs/current/sql-prepare.html

-- 
Justin




Re: Please help! Query jumps from 1s -> 4m

2020-05-03 Thread Justin Pryzby
On Sun, May 03, 2020 at 09:58:27AM +0100, James Thompson wrote:
> Hi,
> 
> Hoping someone can help with this performance issue that's been driving a
> few of us crazy :-) Any guidance greatly appreciated.
> 
> A description of what you are trying to achieve and what results you
> expect.:
>  - I'd like to get an understanding of why the following query (presented
> in full, but there are specific parts that are confusing me) starts off
> taking ~second in duration but 'switches' to taking over 4 minutes.

Does it "switch" abruptly or do you get progressively slower queries ?
If it's abrupt following the 5th execution, I guess you're hitting this:

https://www.postgresql.org/message-id/pine.bso.4.64.0802131404090.6...@leary.csoft.net
https://www.postgresql.org/message-id/a737b7a37273e048b164557adef4a58b50fb8...@ntex2010i.host.magwien.gv.at

>  - we initially saw this behaviour for the exact same sql with a different
> index that resulted in an index scan. To try and fix the issue we've
> created an additional index with additional included fields so we now have
> Index Only Scans, but are still seeing the same problem.

>  Segments of interest:
>  1. ->  Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> table1 table1alias1  (cost=0.56..17.25 rows=1 width=60) (actual
> time=110.539..123828.134 rows=67000 loops=1)
> Index Cond: (col20 = $2005)
> Filter: (((col3 = $2004) OR (col3 IS NULL)) AND ((col8)::text = ANY
> ((ARRAY[$1004, ..., $2003])::text[])))
> Rows Removed by Filter: 2662652
> Heap Fetches: 6940
> Buffers: shared hit=46619 read=42784 written=52

> If I run the same queries now:
> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias1  (cost=0.56..2549.69 rows=69 width=36)
> (actual time=1.017..1221.375 rows=67000 loops=1)
> Heap Fetches: 24
> Buffers: shared hit=2849 read=2483

It looks to me like you're getting good performance following a vacuum, when
Heap Fetches is low.  So you'd want to run vacuum more often, like:
| ALTER TABLE table1 SET (autovacuum_vacuum_scale_factor=0.005).

But maybe I've missed something - you showed the bad query plan, but not the
good one, and I wonder if they may be subtly different, and that's maybe masked
by the replaced identifiers.

-- 
Justin




Re: Please help

2018-03-06 Thread Tomas Vondra
Can you please use separate threads for your questions? That is, don't
start new thread by responding to an existing message (because the new
message then points using "References" header, which is what e-mail
clients use to group messages into threads). And use a proper subject
describing the issue ("Please help" tells people nothing).

That being said, how is this related to performance at all? It seems to
be about pgadmin, so please send it to pgadmin-support I guess.

regards

On 03/06/2018 09:01 AM, Daulat Ram wrote:
> Dear Experts,
> 
>  
> 
> Kndly help to resolve the issue reported during startup pgadmin4 server
> mode on ubuntu 16.04
> 
>  
> 
> dram@vipgadmin:~$ cd .pgadmin4
> 
> dram@vipgadmin:~/.pgadmin4$ chmod +x 
> lib/python2.7/site-packages/pgadmin4/pgAdmin4.py
> 
> dram@vipgadmin:~/.pgadmin4$ chmod 
> lib/python2.7/site-packages/pgadmin4/pgAdmin4.py
> 
> dram@vipgadmin:~/.pgadmin4$ chmod -R 
> lib/python2.7/site-packages/pgadmin4/pgAdmin4.py
> 
> dram@vipgadmin:~/.pgadmin4$ sudo systemctl daemon-reload
> 
> dram@vipgadmin:~/.pgadmin4$  sudo systemctl enable pgadmin4
> 
> dram@vipgadmin:~/.pgadmin4$ sudo systemctl start pgadmin4
> 
> dram@vipgadmin:~/.pgadmin4$ sudo systemctl status  pgadmin4
> 
> ● pgadmin4.service - Pgadmin4 Service
> 
>    Loaded: loaded (/etc/systemd/system/pgadmin4.service; enabled; vendor
> preset: enabled)
> 
>    Active: failed (Result: exit-code) since Mon 2018-03-05 23:57:24 PST;
> 10s ago
> 
>   Process: 14190
> ExecStart=/root/.pgadmin4/lib/python2.7/site-packages/pgadmin4/pgAdmin4.py
> (code=exited, status=200/CHDIR)
> 
> Main PID: 14190 (code=exited, status=200/CHDIR)
> 
>  
> 
> Mar 05 23:57:24 vipgadmin systemd[1]: Started Pgadmin4 Service.
> 
> Mar 05 23:57:24 vipgadmin systemd[1]: pgadmin4.service: Main process
> exited, code=exited, status=200/CHDIR
> 
> Mar 05 23:57:24 vipgadmin systemd[1]: pgadmin4.service: Unit entered
> failed state.
> 
> Mar 05 23:57:24 vipgadmin systemd[1]: pgadmin4.service: Failed with
> result 'exit-code'.
> 
>  
> 
> Regards,
> 
> Daulat
> 
>  
> 
>  
> 

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Please help

2018-02-24 Thread PropAAS DBA

Have you looked at the Nagios XI & Core packages?

https://www.nagios.com/solutions/postgres-monitoring/





On 02/23/2018 12:31 PM, Daulat Ram wrote:


Hello team,

I need help how  & what we can monitor the Postgres database via Nagios.

I came to know about the check_postgres.pl script but we are using 
free ware option of postgres. If its Ok with freeware then please let 
me know the steps how I can implement in our environment.


Regards,

Daulat





Re: Please help

2018-02-23 Thread Andreas Kretschmer



Am 23.02.2018 um 20:31 schrieb Daulat Ram:


Hello team,

I need help how  & what we can monitor the Postgres database via Nagios.

I came to know about the check_postgres.pl script but we are using 
free ware option of postgres. If its Ok with freeware then please let 
me know the steps how I can implement in our environment.





you can use check_postgres from https://bucardo.org/. Please read: 
https://bucardo.org/check_postgres/check_postgres.pl.html#license_and_copyright


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com