Re: [GENERAL] Slow query plan used
"Wetzel, Juergen (Juergen)" writes: > Tom Lane writes: >> You might get some traction by creating indexes on lower(searchfield1) etc. > I will try that. Does that mean the column statistics will only be collected > when there's an index on the table/column? No; ANALYZE collects stats on plain columns automatically. The point is that your query is not on the plain column searchfield1, it is on lower(searchfield1). The planner cannot make a reasonable estimate for "something LIKE constant-pattern" unless it can find statistics that are specifically for the "something" value. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow query plan used
Andreas Kretschmer writes: > please consider my plan B) and increase the stats. See my other mail. I tried that also. Combined with the partial index. But still same result. Bill Moran writes: > LIKE queries are probably challenging to plan, especially when they're > not > left-anchored: how can the planner be reasonalbly expected to estimate > how many rows will be matched by a given LIKE expression. That's clear to me. And because of that I expected the planner to use the table document as outer table in the nested loop join. Especially as here is an index available which gives a restriction to only 130 rows out of the 3. Tom Lane writes: > You might get some traction by creating indexes on lower(searchfield1) etc. > This isn't even necessarily with an expectation that the planner would use > those indexes in the plan ... but what it would do is make use of the > statistics that ANALYZE will accumulate about the indexed expressions. I > think that > would give you better estimates about the LIKE rowcounts. You might have to > crank up the statistics target for those indexes if the default isn't enough > to > make the estimates significantly better. (Obviously, don't forget to > re-ANALYZE before checking results.) I will try that. Does that mean the column statistics will only be collected when there's an index on the table/column? Thanks for all your hints. I will go on and try. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow query plan used
Bill Moran writes: > LIKE queries are probably challenging to plan, especially when they're not > left-anchored: how can the planner be reasonalbly expected to estimate how > many rows will be matched by a given LIKE expression. Yeah, especially without any statistics. The core problem here appears to be the poor rowcount estimate for the LIKE: -> Seq Scan on c3k_document_index (cost=0.00..15160.48 rows=1063 width=285) (actual time=0.206..13539.353 rows=33022 loops=1) Filter: ((lower((searchfield1)::text) ~~ '%sehr%'::text) OR (lower(searchfield8) ~~ '%sehr%'::text)) Rows Removed by Filter: 33832 That's off by a factor of 30. > Not having looked at the code, I would guess that the length of the LIKE > expression will make the planner assume that the match is more restrictive, > while many % and _ in the LIKE expression make the planner assume that the > match is less restrictive. Exactly. It's not terribly bright, but the extra %'s make it think that the pattern is less selective, producing a higher rowcount estimate that --- by chance -- better matches reality. Then you get better decisions about how to shape the rest of the plan. You might get some traction by creating indexes on lower(searchfield1) etc. This isn't even necessarily with an expectation that the planner would use those indexes in the plan ... but what it would do is make use of the statistics that ANALYZE will accumulate about the indexed expressions. I think that would give you better estimates about the LIKE rowcounts. You might have to crank up the statistics target for those indexes if the default isn't enough to make the estimates significantly better. (Obviously, don't forget to re-ANALYZE before checking results.) > That in mind, let me throw pg_trgm into the mix of things to try: > https://www.postgresql.org/docs/current/static/pgtrgm.html > The trigram module allows you to create indexes that LIKE can use > to do index searches instead of always having to do sequential scans > or push the LIKE matching to another part of the plan tree. For cases like the above, where half the table is getting selected, I would not expect the planner to use an index anyway. Any old index on lower(searchfieldN) will work in terms of inducing ANALYZE to collect stats. But if you have other queries with more selective LIKE patterns then maybe a trigram index in particular is worthwhile. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow query plan used
On Thu, 1 Jun 2017 16:45:17 +0200 Andreas Kretschmer wrote: > > Am 01.06.2017 um 14:07 schrieb Wetzel, Juergen (Juergen): > > > > Am 31.05.2017 um 13:27 schrieb Wetzel, Juergen (Juergen): > Only 130 rows out of the 3 have ARCHIVED = 0 > >>> in this case i would suggest a partial index: > >>> create index on (archived) where archived = 0; > >> Thanks, Andreas. > >> > >> Sorry for the confusion about the table names. > >> The hint with the partial index sounds as it could solve the problem. I > >> will test it. > >> > > Hi, > > > > I created now a partial index > > create index on document (archived) where archived = '0'; > > just to be sure: this syntay is wrong, missing index-name. But it seems > the index is document_archived_idx ... > > > But result is same as before: a short like expression included in doubled > > %-signs leads to a fast query plan whereas a longer like expression or use > > of single %-sign creates a much slower query. Please see below query plans. > > Most surprisingly to me is the influence of the like expression, especially > > the doubled %-sign on short expressions. Any other ideas how to speed up > > that query or what is going on here in general? LIKE queries are probably challenging to plan, especially when they're not left-anchored: how can the planner be reasonalbly expected to estimate how many rows will be matched by a given LIKE expression. Not having looked at the code, I would guess that the length of the LIKE expression will make the planner assume that the match is more restrictive, while many % and _ in the LIKE expression make the planner assume that the match is less restrictive. Extrapolate that into guessing a number of matched tuples and how that fits into the overall plan and you'll probaby give yourself a brain anuerism. While having a detailed understanding of exactly how the planner makes such decisions is certainly worthwhile, I would recommend a more pragmatic approach: try things and see what works. That in mind, let me throw pg_trgm into the mix of things to try: https://www.postgresql.org/docs/current/static/pgtrgm.html The trigram module allows you to create indexes that LIKE can use to do index searches instead of always having to do sequential scans or push the LIKE matching to another part of the plan tree. Based on your described situation, I have a theory that it might improve things quite a bit. -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow query plan used
Am 01.06.2017 um 14:07 schrieb Wetzel, Juergen (Juergen): Am 31.05.2017 um 13:27 schrieb Wetzel, Juergen (Juergen): Only 130 rows out of the 3 have ARCHIVED = 0 in this case i would suggest a partial index: create index on (archived) where archived = 0; Thanks, Andreas. Sorry for the confusion about the table names. The hint with the partial index sounds as it could solve the problem. I will test it. Hi, I created now a partial index create index on document (archived) where archived = '0'; just to be sure: this syntay is wrong, missing index-name. But it seems the index is document_archived_idx ... But result is same as before: a short like expression included in doubled %-signs leads to a fast query plan whereas a longer like expression or use of single %-sign creates a much slower query. Please see below query plans. Most surprisingly to me is the influence of the like expression, especially the doubled %-sign on short expressions. Any other ideas how to speed up that query or what is going on here in general? please consider my plan B) and increase the stats. See my other mail. Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow query plan used
Am 31.05.2017 um 13:27 schrieb Wetzel, Juergen (Juergen): >>> Only 130 rows out of the 3 have ARCHIVED = 0 >> in this case i would suggest a partial index: >> create index on (archived) where archived = 0; > Thanks, Andreas. > > Sorry for the confusion about the table names. > The hint with the partial index sounds as it could solve the problem. I will > test it. > Hi, I created now a partial index create index on document (archived) where archived = '0'; But result is same as before: a short like expression included in doubled %-signs leads to a fast query plan whereas a longer like expression or use of single %-sign creates a much slower query. Please see below query plans. Most surprisingly to me is the influence of the like expression, especially the doubled %-sign on short expressions. Any other ideas how to speed up that query or what is going on here in general? Thanks, Jürgen Limit (cost=24327.12..24327.21 rows=38 width=662) (actual time=15373.542..15373.565 rows=136 loops=1) Buffers: shared hit=264747 read=51242 dirtied=6 written=16 -> Sort (cost=24327.12..24327.21 rows=38 width=662) (actual time=15373.541..15373.557 rows=136 loops=1) Sort Key: document.created, document.id Sort Method: quicksort Memory: 102kB Buffers: shared hit=264747 read=51242 dirtied=6 written=16 -> Nested Loop (cost=0.42..24326.12 rows=38 width=662) (actual time=10.951..15372.914 rows=136 loops=1) Buffers: shared hit=264741 read=51242 dirtied=6 written=16 -> Seq Scan on document_index (cost=0.00..15403.68 rows=1257 width=289) (actual time=0.205..14901.743 rows=38545 loops=1) Filter: ((lower((searchfield1)::text) ~~ '%sehr%'::text) OR (lower(searchfield8) ~~ '%sehr%'::text)) Rows Removed by Filter: 40531 Buffers: shared hit=123181 read=38272 dirtied=5 written=12 -> Index Scan using document_6720023941 on document (cost=0.42..7.09 rows=1 width=381) (actual time=0.011..0.011 rows=0 loops=38545) Index Cond: (id = document_index.documentid) Filter: (((folder)::text = ANY ('{*INBOX,*DELAYED}'::text[])) AND (archived = '0'::bpchar) AND ((umr_actualtopicid)::text = ANY ('{f3fb34574100,804840564100,4fc81b554100,d27d9c4d41000200,e9aba54d4100,4aaf905441000a00,737 (...) Rows Removed by Filter: 1 Buffers: shared hit=141560 read=12970 dirtied=1 written=4 Total runtime: 15373.763 ms Limit (cost=34194.37..34194.74 rows=150 width=662) (actual time=181.502..181.525 rows=134 loops=1) Buffers: shared hit=8022 read=277 -> Sort (cost=34194.37..34194.83 rows=186 width=662) (actual time=181.501..181.516 rows=134 loops=1) Sort Key: document.created, c3k_document.id Sort Method: quicksort Memory: 101kB Buffers: shared hit=8022 read=277 -> Nested Loop (cost=3546.02..34187.36 rows=186 width=662) (actual time=32.660..181.064 rows=134 loops=1) Buffers: shared hit=8022 read=277 -> Bitmap Heap Scan on document (cost=3545.61..19272.79 rows=2375 width=381) (actual time=22.771..96.683 rows=458 loops=1) Recheck Cond: ((archived = '0'::bpchar) AND (ownerid = ANY ('{5000239,5000238,5000234,5000113,5000237,5000236,5000230,5000112,5000233,5000111,5000232,13,15,16,18,19,5000249,5000246,5000124,5000245,5000127,5000247,5000242,5000120,5000123 (...) Rows Removed by Index Recheck: 15733 Filter: ((actualtopicid)::text = ANY ('{f3fb34574100,804840564100,4fc81b554100,d27d9c4d41000200,e9aba54d4100,4aaf905441000a00,737e9c4d41000900,3ecdec4d4100,4aaf905441000800,4aaf905441000e00,fc7e9c4d41000f00,11ffc (...) Rows Removed by Filter: 27 Buffers: shared hit=5677 -> BitmapAnd (cost=3545.61..3545.61 rows=6228 width=0) (actual time=22.056..22.056 rows=0 loops=1) Buffers: shared hit=2470 -> Bitmap Index Scan on document_archived_idx (cost=0.00..1131.17 rows=54784 width=0) (actual time=11.694..11.694 rows=60295 loops=1) Index Cond: (archived = '0'::bpchar) Buffers: shared hit=184 -> Bitmap Index Scan on document_ownerid_folder_status_idx (cost=0.00..2413.00 rows=8973 width=0) (actual time=8.718..8.718 rows=14962 loops=1) Index Cond: ((ownerid = ANY ('{5000239,5000238,5000234,5000113,5000237,5000236,5000230,5000112,5000233,5000111,5000232,13,15,16,18,19,5000249,5000246,5000124,5000245,5000127,5000247,5000242,5000120,5000123,5000244,5000122,50 (...) Buffers: shared hit=2286 -> Index Scan using document_7965268402 on document_index (cost=0.42..6.27 rows=1 width
Re: [GENERAL] Slow query plan used
Am 31.05.2017 um 13:27 schrieb Wetzel, Juergen (Juergen): Only 130 rows out of the 3 have ARCHIVED = 0 in this case i would suggest a partial index: create index on (archived) where archived = 0; Thanks, Andreas. Sorry for the confusion about the table names. The hint with the partial index sounds as it could solve the problem. I will test it. you are welcome. please keep me informed if that helps you. Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow query plan used
>> Only 130 rows out of the 3 have ARCHIVED = 0 > in this case i would suggest a partial index: > create index on (archived) where archived = 0; Thanks, Andreas. Sorry for the confusion about the table names. The hint with the partial index sounds as it could solve the problem. I will test it. Regards Jürgen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Slow query plan used
Am 30.05.2017 um 10:42 schrieb Wetzel, Juergen (Juergen): > I have a question concerning the query planner. I observe that chosen query plan differs on length and content of a like > search expression. We have a view combining data from two tables, both containing same number of rows (round about 3). > Used PostgreSQL version is 9.3.15 on Windows. DDL of tables and view see below. the query is on DOCUMENTDATA, but plan and DDL for c3k_documentdata and other tables. > Only 130 rows out of the 3 have ARCHIVED = 0 in this case i would suggest a partial index: create index on (archived) where archived = 0; You can also increase the statistics for the columns subject and mailContent ALTERTABLEALTERCOLUMN SETSTATISTICS=1000; (as example, the default-value is 100) Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
[GENERAL] Slow query plan used
I have a question concerning the query planner. I observe that chosen query plan differs on length and content of a like search expression. We have a view combining data from two tables, both containing same number of rows (round about 3). Used PostgreSQL version is 9.3.15 on Windows. DDL of tables and view see below. Query is: select ID, OWNERID, FOLDER, RCVSERVICE, STATUS, TASKSTATUS, CREATED from DOCUMENTDATA where FOLDER in ('*INBOX','*DELAYED') and ARCHIVED='0' and ( lower(subject) like lower('%Sehr%') or lower(mailContent) like lower('%Sehr%') ) and UMR_ACTUALTOPICID in ('f3fb34574100','804840564100','4fc81b554100','d27d9c4d41000200','e9aba54d4100','4aaf905441000a00','737e9c4d41000900', '4aaf905441000800','3ecdec4d4100','4aaf905441000e00','fc7e9c4d41000f00','4aaf905441000c00','11ffc54f4100','4aaf905441000200') and OWNERID in ('5000239','5000238','5000234','5000113','5000237','5000236') order by CREATED desc, ID desc limit 150 offset 0 Only 130 rows out of the 3 have ARCHIVED = 0 Chosen query plan is following: Limit (cost=22738.95..22739.00 rows=20 width=664) (actual time=13929.849..13929.869 rows=98 loops=1) Buffers: shared hit=221263 read=45723 -> Sort (cost=22738.95..22739.00 rows=20 width=664) (actual time=13929.848..13929.863 rows=98 loops=1) Sort Key: c3k_document.created, c3k_document.id Sort Method: quicksort Memory: 87kB Buffers: shared hit=221263 read=45723 -> Nested Loop (cost=0.42..22738.52 rows=20 width=664) (actual time=95.508..13929.478 rows=98 loops=1) Buffers: shared hit=221263 read=45723 -> Seq Scan on c3k_document_index (cost=0.00..15160.48 rows=1063 width=285) (actual time=0.206..13539.353 rows=33022 loops=1) Filter: ((lower((searchfield1)::text) ~~ '%sehr%'::text) OR (lower(searchfield8) ~~ '%sehr%'::text)) Rows Removed by Filter: 33832 Buffers: shared hit=101130 read=33463 -> Index Scan using c3k_docume_6720023941 on c3k_document (cost=0.42..7.12 rows=1 width=387) (actual time=0.011..0.011 rows=0 loops=33022) Index Cond: (id = c3k_document_index.documentid) Filter: (((folder)::text = ANY ('{*INBOX,*DELAYED}'::text[])) AND (archived = '0'::bpchar) AND ((umr_actualtopicid)::text = ANY ('{f3fb34574100,804840564100,4fc81b554100,d27d9c4d41000200,e9aba54d4100,4aaf905441000a00,737 (...) Rows Removed by Filter: 1 Buffers: shared hit=120133 read=12260 Total runtime: 13930.186 ms If the like expression is lower('%%Sehr%%') for fields subject and mailContent, a much faster query plan is chosen: Limit (cost=24018.18..24018.42 rows=97 width=664) (actual time=61.110..61.130 rows=98 loops=1) Buffers: shared hit=1961 read=598 -> Sort (cost=24018.18..24018.42 rows=97 width=664) (actual time=61.109..61.122 rows=98 loops=1) Sort Key: c3k_document.created, c3k_document.id Sort Method: quicksort Memory: 87kB Buffers: shared hit=1961 read=598 -> Nested Loop (cost=515.26..24014.98 rows=97 width=664) (actual time=5.193..60.851 rows=98 loops=1) Buffers: shared hit=1961 read=598 -> Bitmap Heap Scan on c3k_document (cost=514.96..16137.34 rows=1232 width=387) (actual time=2.137..10.754 rows=282 loops=1) Recheck Cond: (((folder)::text = ANY ('{*INBOX,*DELAYED}'::text[])) AND (archived = '0'::bpchar)) Filter: (((umr_actualtopicid)::text = ANY ('{f3fb34574100,804840564100,4fc81b554100,d27d9c4d41000200,e9aba54d4100,4aaf905441000a00,737e9c4d41000900,4aaf905441000800,3ecdec4d4100,4aaf905441000e00,fc7e9c4d41000f00,11ff (...) Rows Removed by Filter: 23 Buffers: shared hit=828 read=356 -> Bitmap Index Scan on c3k_document_folder_archived_umr_orgtopicid_idx (cost=0.00..514.66 rows=6183 width=0) (actual time=1.946..1.946 rows=2847 loops=1) Index Cond: (((folder)::text = ANY ('{*INBOX,*DELAYED}'::text[])) AND (archived = '0'::bpchar)) Buffers: shared hit=7 read=113 -> Index Scan using c3k_docume_7965268402 on c3k_document_index (cost=0.29..6.38 rows=1 width=285) (actual time=0.176..0.176 rows=0 loops=282) Index Cond: (documentid = c3k_document.id) Filter: ((lower((searchfield1)::text) ~~ '%%sehr%%'::text) OR (lower(searchfield8) ~~ '%%sehr%%'::text)) Rows Removed by Filter: 1 Buffers: shared hit=1133 read=242 Total runtime: 61.385 ms But when the like expression is only a few characters longer the first - slower - query plan is used again. I thou