Re: [GENERAL] Slow query plan used

2017-06-01 Thread Tom Lane
"Wetzel, Juergen (Juergen)"  writes:
> Tom Lane  writes:
>> You might get some traction by creating indexes on lower(searchfield1) etc.

> I will try that. Does that mean the column statistics will only be collected 
> when there's an index on the table/column?

No; ANALYZE collects stats on plain columns automatically.  The point
is that your query is not on the plain column searchfield1, it is on
lower(searchfield1).  The planner cannot make a reasonable estimate
for "something LIKE constant-pattern" unless it can find statistics
that are specifically for the "something" value.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow query plan used

2017-06-01 Thread Wetzel, Juergen (Juergen)
Andreas Kretschmer  writes:
> please consider my plan B) and increase the stats. See my other mail.

I tried that also. Combined with the partial index. But still same result.

Bill Moran  writes:
> LIKE queries are probably challenging to plan, especially when they're 
> not
> left-anchored: how can the planner be reasonalbly expected to estimate 
> how many rows will be matched by a given LIKE expression.

That's clear to me. And because of that I expected the planner to use the table 
document as outer table in the nested loop join. Especially as here is an index 
available which gives a restriction to only 130 rows out of the 3.

Tom Lane  writes:
> You might get some traction by creating indexes on lower(searchfield1) etc.  
> This isn't even necessarily with an expectation that the planner would use 
> those indexes in the plan ... but what it would do is make use of the 
> statistics that ANALYZE will accumulate about the indexed expressions.  I 
> think that
> would give you better estimates about the LIKE rowcounts.  You might have to 
> crank up the statistics target for those indexes if the default isn't enough 
> to
> make the estimates significantly better.  (Obviously, don't forget to 
> re-ANALYZE before checking results.)

I will try that. Does that mean the column statistics will only be collected 
when there's an index on the table/column?

Thanks for all your hints. I will go on and try.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow query plan used

2017-06-01 Thread Tom Lane
Bill Moran  writes:
> LIKE queries are probably challenging to plan, especially when they're not
> left-anchored: how can the planner be reasonalbly expected to estimate how
> many rows will be matched by a given LIKE expression.

Yeah, especially without any statistics.  The core problem here appears
to be the poor rowcount estimate for the LIKE:

  ->  Seq Scan on c3k_document_index  (cost=0.00..15160.48 
rows=1063 width=285) (actual time=0.206..13539.353 rows=33022 loops=1)
Filter: ((lower((searchfield1)::text) ~~ '%sehr%'::text) OR 
(lower(searchfield8) ~~ '%sehr%'::text))
Rows Removed by Filter: 33832

That's off by a factor of 30.

> Not having looked at the code, I would guess that the length of the LIKE
> expression will make the planner assume that the match is more restrictive,
> while many % and _ in the LIKE expression make the planner assume that the
> match is less restrictive.

Exactly.  It's not terribly bright, but the extra %'s make it think that
the pattern is less selective, producing a higher rowcount estimate that
--- by chance -- better matches reality.  Then you get better decisions
about how to shape the rest of the plan.

You might get some traction by creating indexes on lower(searchfield1)
etc.  This isn't even necessarily with an expectation that the planner
would use those indexes in the plan ... but what it would do is make
use of the statistics that ANALYZE will accumulate about the indexed
expressions.  I think that would give you better estimates about the
LIKE rowcounts.  You might have to crank up the statistics target for
those indexes if the default isn't enough to make the estimates
significantly better.  (Obviously, don't forget to re-ANALYZE before
checking results.)

> That in mind, let me throw pg_trgm into the mix of things to try:
> https://www.postgresql.org/docs/current/static/pgtrgm.html
> The trigram module allows you to create indexes that LIKE can use
> to do index searches instead of always having to do sequential scans
> or push the LIKE matching to another part of the plan tree.

For cases like the above, where half the table is getting selected,
I would not expect the planner to use an index anyway.  Any old index
on lower(searchfieldN) will work in terms of inducing ANALYZE to
collect stats.  But if you have other queries with more selective
LIKE patterns then maybe a trigram index in particular is worthwhile.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow query plan used

2017-06-01 Thread Bill Moran
On Thu, 1 Jun 2017 16:45:17 +0200
Andreas Kretschmer  wrote:
> 
> Am 01.06.2017 um 14:07 schrieb Wetzel, Juergen (Juergen):
> >
> > Am 31.05.2017 um 13:27 schrieb Wetzel, Juergen (Juergen):
>  Only 130 rows out of the 3 have ARCHIVED = 0
> >>> in this case i would suggest a partial index:
> >>> create index  on  (archived) where archived = 0;
> >> Thanks, Andreas.
> >>
> >> Sorry for the confusion about the table names.
> >> The hint with the partial index sounds as it could solve the problem. I 
> >> will test it.
> >>
> > Hi,
> >
> > I created now a partial index
> > create index on document (archived) where archived = '0';
> 
> just to be sure: this syntay is wrong, missing index-name. But it seems 
> the index is document_archived_idx ...
> 
> > But result is same as before: a short like expression included in doubled 
> > %-signs leads to a fast query plan whereas a longer like expression or use 
> > of single %-sign creates a much slower query. Please see below query plans. 
> > Most surprisingly to me is the influence of the like expression, especially 
> > the doubled %-sign on short expressions. Any other ideas how to speed up 
> > that query or what is going on here in general?

LIKE queries are probably challenging to plan, especially when they're not
left-anchored: how can the planner be reasonalbly expected to estimate how
many rows will be matched by a given LIKE expression.

Not having looked at the code, I would guess that the length of the LIKE
expression will make the planner assume that the match is more restrictive,
while many % and _ in the LIKE expression make the planner assume that the
match is less restrictive. Extrapolate that into guessing a number of matched
tuples and how that fits into the overall plan and you'll probaby give
yourself a brain anuerism. While having a detailed understanding of exactly
how the planner makes such decisions is certainly worthwhile, I would
recommend a more pragmatic approach: try things and see what works.

That in mind, let me throw pg_trgm into the mix of things to try:
https://www.postgresql.org/docs/current/static/pgtrgm.html
The trigram module allows you to create indexes that LIKE can use
to do index searches instead of always having to do sequential scans
or push the LIKE matching to another part of the plan tree. Based on
your described situation, I have a theory that it might improve things
quite a bit.

-- 
Bill Moran 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow query plan used

2017-06-01 Thread Andreas Kretschmer


Am 01.06.2017 um 14:07 schrieb Wetzel, Juergen (Juergen):


Am 31.05.2017 um 13:27 schrieb Wetzel, Juergen (Juergen):

Only 130 rows out of the 3 have ARCHIVED = 0

in this case i would suggest a partial index:
create index  on  (archived) where archived = 0;

Thanks, Andreas.

Sorry for the confusion about the table names.
The hint with the partial index sounds as it could solve the problem. I will 
test it.


Hi,

I created now a partial index
create index on document (archived) where archived = '0';


just to be sure: this syntay is wrong, missing index-name. But it seems 
the index is document_archived_idx ...





But result is same as before: a short like expression included in doubled 
%-signs leads to a fast query plan whereas a longer like expression or use of 
single %-sign creates a much slower query. Please see below query plans. Most 
surprisingly to me is the influence of the like expression, especially the 
doubled %-sign on short expressions. Any other ideas how to speed up that query 
or what is going on here in general?



please consider my plan B) and increase the stats. See my other mail.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow query plan used

2017-06-01 Thread Wetzel, Juergen (Juergen)


Am 31.05.2017 um 13:27 schrieb Wetzel, Juergen (Juergen):
>>> Only 130 rows out of the 3 have ARCHIVED = 0
>> in this case i would suggest a partial index:
>> create index  on  (archived) where archived = 0;
> Thanks, Andreas.
>
> Sorry for the confusion about the table names.
> The hint with the partial index sounds as it could solve the problem. I will 
> test it.
>

Hi,

I created now a partial index
create index on document (archived) where archived = '0'; 

But result is same as before: a short like expression included in doubled 
%-signs leads to a fast query plan whereas a longer like expression or use of 
single %-sign creates a much slower query. Please see below query plans. Most 
surprisingly to me is the influence of the like expression, especially the 
doubled %-sign on short expressions. Any other ideas how to speed up that query 
or what is going on here in general?

Thanks,
Jürgen

Limit  (cost=24327.12..24327.21 rows=38 width=662) (actual 
time=15373.542..15373.565 rows=136 loops=1)
  Buffers: shared hit=264747 read=51242 dirtied=6 written=16
  ->  Sort  (cost=24327.12..24327.21 rows=38 width=662) (actual 
time=15373.541..15373.557 rows=136 loops=1)
Sort Key: document.created, document.id
Sort Method: quicksort  Memory: 102kB
Buffers: shared hit=264747 read=51242 dirtied=6 written=16
->  Nested Loop  (cost=0.42..24326.12 rows=38 width=662) (actual 
time=10.951..15372.914 rows=136 loops=1)
  Buffers: shared hit=264741 read=51242 dirtied=6 written=16
  ->  Seq Scan on document_index  (cost=0.00..15403.68 rows=1257 
width=289) (actual time=0.205..14901.743 rows=38545 loops=1)
Filter: ((lower((searchfield1)::text) ~~ '%sehr%'::text) OR 
(lower(searchfield8) ~~ '%sehr%'::text))
Rows Removed by Filter: 40531
Buffers: shared hit=123181 read=38272 dirtied=5 written=12
  ->  Index Scan using document_6720023941 on document  
(cost=0.42..7.09 rows=1 width=381) (actual time=0.011..0.011 rows=0 loops=38545)
Index Cond: (id = document_index.documentid)
Filter: (((folder)::text = ANY 
('{*INBOX,*DELAYED}'::text[])) AND (archived = '0'::bpchar) AND 
((umr_actualtopicid)::text = ANY 
('{f3fb34574100,804840564100,4fc81b554100,d27d9c4d41000200,e9aba54d4100,4aaf905441000a00,737
 (...)
Rows Removed by Filter: 1
Buffers: shared hit=141560 read=12970 dirtied=1 written=4
Total runtime: 15373.763 ms


Limit  (cost=34194.37..34194.74 rows=150 width=662) (actual 
time=181.502..181.525 rows=134 loops=1)
  Buffers: shared hit=8022 read=277
  ->  Sort  (cost=34194.37..34194.83 rows=186 width=662) (actual 
time=181.501..181.516 rows=134 loops=1)
Sort Key: document.created, c3k_document.id
Sort Method: quicksort  Memory: 101kB
Buffers: shared hit=8022 read=277
->  Nested Loop  (cost=3546.02..34187.36 rows=186 width=662) (actual 
time=32.660..181.064 rows=134 loops=1)
  Buffers: shared hit=8022 read=277
  ->  Bitmap Heap Scan on document  (cost=3545.61..19272.79 
rows=2375 width=381) (actual time=22.771..96.683 rows=458 loops=1)
Recheck Cond: ((archived = '0'::bpchar) AND (ownerid = ANY 
('{5000239,5000238,5000234,5000113,5000237,5000236,5000230,5000112,5000233,5000111,5000232,13,15,16,18,19,5000249,5000246,5000124,5000245,5000127,5000247,5000242,5000120,5000123
 (...)
Rows Removed by Index Recheck: 15733
Filter: ((actualtopicid)::text = ANY 
('{f3fb34574100,804840564100,4fc81b554100,d27d9c4d41000200,e9aba54d4100,4aaf905441000a00,737e9c4d41000900,3ecdec4d4100,4aaf905441000800,4aaf905441000e00,fc7e9c4d41000f00,11ffc
 (...)
Rows Removed by Filter: 27
Buffers: shared hit=5677
->  BitmapAnd  (cost=3545.61..3545.61 rows=6228 width=0) 
(actual time=22.056..22.056 rows=0 loops=1)
  Buffers: shared hit=2470
  ->  Bitmap Index Scan on document_archived_idx  
(cost=0.00..1131.17 rows=54784 width=0) (actual time=11.694..11.694 rows=60295 
loops=1)
Index Cond: (archived = '0'::bpchar)
Buffers: shared hit=184
  ->  Bitmap Index Scan on 
document_ownerid_folder_status_idx  (cost=0.00..2413.00 rows=8973 width=0) 
(actual time=8.718..8.718 rows=14962 loops=1)
Index Cond: ((ownerid = ANY 
('{5000239,5000238,5000234,5000113,5000237,5000236,5000230,5000112,5000233,5000111,5000232,13,15,16,18,19,5000249,5000246,5000124,5000245,5000127,5000247,5000242,5000120,5000123,5000244,5000122,50
 (...)
Buffers: shared hit=2286
  ->  Index Scan using document_7965268402 on document_index  
(cost=0.42..6.27 rows=1 width

Re: [GENERAL] Slow query plan used

2017-05-31 Thread Andreas Kretschmer



Am 31.05.2017 um 13:27 schrieb Wetzel, Juergen (Juergen):

Only 130 rows out of the 3 have ARCHIVED = 0

in this case i would suggest a partial index:
create index  on  (archived) where archived = 0;

Thanks, Andreas.

Sorry for the confusion about the table names.
The hint with the partial index sounds as it could solve the problem. I will 
test it.



you are welcome. please keep me informed if that helps you.

Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow query plan used

2017-05-31 Thread Wetzel, Juergen (Juergen)
>> Only 130 rows out of the 3 have ARCHIVED = 0

> in this case i would suggest a partial index:
> create index  on  (archived) where archived = 0;

Thanks, Andreas.

Sorry for the confusion about the table names.
The hint with the partial index sounds as it could solve the problem. I will 
test it.

Regards
Jürgen


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Slow query plan used

2017-05-31 Thread Andreas Kretschmer



Am 30.05.2017 um 10:42 schrieb Wetzel, Juergen (Juergen):

> I have a question concerning the query planner. I observe that chosen 
query plan differs on length and content of a like
> search expression. We have a view combining data from two tables, 
both containing same number of rows (round about 3).
> Used PostgreSQL version is 9.3.15 on Windows. DDL of tables and view 
see below.


the query is on DOCUMENTDATA, but plan and DDL for c3k_documentdata and 
other tables.


> Only 130 rows out of the 3 have ARCHIVED = 0

in this case i would suggest a partial index:

create index  on  (archived) where archived = 0;

You can also increase the statistics for the columns subject and mailContent


ALTERTABLEALTERCOLUMN SETSTATISTICS=1000;

(as example, the default-value is 100)


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



[GENERAL] Slow query plan used

2017-05-30 Thread Wetzel, Juergen (Juergen)
I have a question concerning the query planner. I observe that chosen query 
plan differs on length and content of a like search expression.
We have a view combining data from two tables, both containing same number of 
rows (round about 3). Used PostgreSQL version is 9.3.15 on Windows.
DDL of tables and view see below.

Query is:
select ID, OWNERID, FOLDER, RCVSERVICE, STATUS, TASKSTATUS, CREATED
from DOCUMENTDATA 
where  FOLDER in ('*INBOX','*DELAYED')  and 
   ARCHIVED='0' and 
   ( lower(subject) like lower('%Sehr%')  or  lower(mailContent) like 
lower('%Sehr%') ) and 
   UMR_ACTUALTOPICID in 
('f3fb34574100','804840564100','4fc81b554100','d27d9c4d41000200','e9aba54d4100','4aaf905441000a00','737e9c4d41000900',
 
'4aaf905441000800','3ecdec4d4100','4aaf905441000e00','fc7e9c4d41000f00','4aaf905441000c00','11ffc54f4100','4aaf905441000200')
 and 
   OWNERID in 
('5000239','5000238','5000234','5000113','5000237','5000236')
order by CREATED desc, ID desc limit 150 offset 0

Only 130 rows out of the 3 have ARCHIVED = 0 
Chosen query plan is following:

Limit  (cost=22738.95..22739.00 rows=20 width=664) (actual 
time=13929.849..13929.869 rows=98 loops=1)
  Buffers: shared hit=221263 read=45723
  ->  Sort  (cost=22738.95..22739.00 rows=20 width=664) (actual 
time=13929.848..13929.863 rows=98 loops=1)
Sort Key: c3k_document.created, c3k_document.id
Sort Method: quicksort  Memory: 87kB
Buffers: shared hit=221263 read=45723
->  Nested Loop  (cost=0.42..22738.52 rows=20 width=664) (actual 
time=95.508..13929.478 rows=98 loops=1)
  Buffers: shared hit=221263 read=45723
  ->  Seq Scan on c3k_document_index  (cost=0.00..15160.48 
rows=1063 width=285) (actual time=0.206..13539.353 rows=33022 loops=1)
Filter: ((lower((searchfield1)::text) ~~ '%sehr%'::text) OR 
(lower(searchfield8) ~~ '%sehr%'::text))
Rows Removed by Filter: 33832
Buffers: shared hit=101130 read=33463
  ->  Index Scan using c3k_docume_6720023941 on c3k_document  
(cost=0.42..7.12 rows=1 width=387) (actual time=0.011..0.011 rows=0 loops=33022)
Index Cond: (id = c3k_document_index.documentid)
Filter: (((folder)::text = ANY 
('{*INBOX,*DELAYED}'::text[])) AND (archived = '0'::bpchar) AND 
((umr_actualtopicid)::text = ANY 
('{f3fb34574100,804840564100,4fc81b554100,d27d9c4d41000200,e9aba54d4100,4aaf905441000a00,737
 (...)
Rows Removed by Filter: 1
Buffers: shared hit=120133 read=12260
Total runtime: 13930.186 ms

If the like expression is lower('%%Sehr%%')  for fields subject and 
mailContent, a much faster query plan is chosen:

Limit  (cost=24018.18..24018.42 rows=97 width=664) (actual time=61.110..61.130 
rows=98 loops=1)
  Buffers: shared hit=1961 read=598
  ->  Sort  (cost=24018.18..24018.42 rows=97 width=664) (actual 
time=61.109..61.122 rows=98 loops=1)
Sort Key: c3k_document.created, c3k_document.id
Sort Method: quicksort  Memory: 87kB
Buffers: shared hit=1961 read=598
->  Nested Loop  (cost=515.26..24014.98 rows=97 width=664) (actual 
time=5.193..60.851 rows=98 loops=1)
  Buffers: shared hit=1961 read=598
  ->  Bitmap Heap Scan on c3k_document  (cost=514.96..16137.34 
rows=1232 width=387) (actual time=2.137..10.754 rows=282 loops=1)
Recheck Cond: (((folder)::text = ANY 
('{*INBOX,*DELAYED}'::text[])) AND (archived = '0'::bpchar))
Filter: (((umr_actualtopicid)::text = ANY 
('{f3fb34574100,804840564100,4fc81b554100,d27d9c4d41000200,e9aba54d4100,4aaf905441000a00,737e9c4d41000900,4aaf905441000800,3ecdec4d4100,4aaf905441000e00,fc7e9c4d41000f00,11ff
 (...)
Rows Removed by Filter: 23
Buffers: shared hit=828 read=356
->  Bitmap Index Scan on 
c3k_document_folder_archived_umr_orgtopicid_idx  (cost=0.00..514.66 rows=6183 
width=0) (actual time=1.946..1.946 rows=2847 loops=1)
  Index Cond: (((folder)::text = ANY 
('{*INBOX,*DELAYED}'::text[])) AND (archived = '0'::bpchar))
  Buffers: shared hit=7 read=113
  ->  Index Scan using c3k_docume_7965268402 on c3k_document_index  
(cost=0.29..6.38 rows=1 width=285) (actual time=0.176..0.176 rows=0 loops=282)
Index Cond: (documentid = c3k_document.id)
Filter: ((lower((searchfield1)::text) ~~ '%%sehr%%'::text) 
OR (lower(searchfield8) ~~ '%%sehr%%'::text))
Rows Removed by Filter: 1
Buffers: shared hit=1133 read=242
Total runtime: 61.385 ms

But when the like expression is only a few characters longer the first - slower 
- query plan is used again. I thou