Re: Problem with indices from 10 to 13

2021-09-28 Thread Justin Pryzby
On Wed, Sep 29, 2021 at 02:11:15AM +, Daniel Diniz wrote:
> How do i increase  the statistics target for h.nome_des?
> And why uploading the dump at 10 and at 13 is there this difference?

It's like ALTER TABLE h ALTER nome_des SET STATISTICS 2000; ANALYZE h;
https://www.postgresql.org/docs/current/sql-altertable.html

-- 
Justin




Re: Problem with indices from 10 to 13

2021-09-28 Thread Tom Lane
Daniel Diniz  writes:
> The index I use is the GIN.

pg_trgm, you mean?  That answers one question, but you still didn't
explain what type h.nome_des is, nor how bytea and convert_from()
are getting into the picture.

The second part of that is probably not critical, since the planner
should be willing to reduce the convert_from() call to a constant
for planning purposes, so I'm unclear as to why the estimate for
the ilike clause is so bad.  Have you tried increasing the statistics
target for h.nome_des to see if the estimate gets better?

regards, tom lane




Re: Problem with indices from 10 to 13

2021-09-28 Thread Alan Hodgson
Em ter., 28 de set. de 2021 às 12:40, Daniel Diniz  
escreveu:
> > Hello I migrated from postgres 10 to 13 and I noticed that there was a big
> > increase in a querie that I use, I did explain in 10 and 13 and the
> > difference is absurd, the indices and data are the same in 2. I've re-
> > created and re-indexed but I don't know what changed from 10 to 13 which
> > made the performance so bad, I don't know if it needs some extra parameter
> > in some conf on 13.
> > 
> > Postgres 13
> > 
> > "QUERY PLAN"
> > "Limit  (cost=1.13..26855.48 rows=30 width=137) (actual
> > time=10886.585..429803.463 rows=4 loops=1)"
> > "  ->  Nested Loop  (cost=1.13..19531164.71 rows=21819 width=137) (actual
> > time=10886.584..429803.457 rows=4 loops=1)"
> > "        Join Filter: (h.ult_eve_id = ev.evento_id)"
> > "        Rows Removed by Join Filter: 252"
> > "        ->  Nested Loop  (cost=1.13..19457514.32 rows=21819 width=62)
> > (actual time=10886.326..429803.027 rows=4 loops=1)"
> > "              ->  Nested Loop  (cost=0.85..19450780.70 rows=21819
> > width=55) (actual time=10886.259..429802.908 rows=4 loops=1)"
> > "                    ->  Index Scan Backward using hawbs_pkey on hawbs h
> >  (cost=0.57..19444209.67 rows=21819 width=46) (actual
> > time=10886.119..429802.676 rows=4 loops=1)"
> > "                          Filter: ((tipo_hawb_id = ANY
> > ('{1,10,3}'::integer[])) AND ((nome_des)::text ~~*
> > convert_from('\x255354455048414e592053544f4557204c45414e44524f25'::bytea,
> > 'LATIN1'::name)))"
> > "                          Rows Removed by Filter: 239188096"
> 
> Index Scan Backward looks suspicious to me.
> 239,188,096  rows removed by filter it's a lot of work.
> 
> Do you, run analyze?

PostgreSQL has an unfortunate love of scanning the pkey index backwards when
you use LIMIT.

Try pushing your actual query into a subquery (with an offset 0 to prevent it
being optimized out) and then do the LIMIT outside it.



Re: Problem with indices from 10 to 13

2021-09-28 Thread Tom Lane
Daniel Diniz  writes:
> Hello I migrated from postgres 10 to 13 and I noticed that there was a big 
> increase in a querie that I use, I did explain in 10 and 13 and the 
> difference is absurd, the indices and data are the same in 2. I've re-created 
> and re-indexed but I don't know what changed from 10 to 13 which made the 
> performance so bad, I don't know if it needs some extra parameter in some 
> conf on 13.

This complaint is missing an awful lot of supporting information.

> "->  Bitmap Heap Scan on hawbs h  
> (cost=1058.34..26261.32 rows=21451 width=46) (actual time=201.956..201.966 
> rows=4 loops=1)"
> "  Recheck Cond: ((nome_des)::text ~~* 
> convert_from('\x255354455048414e592053544f4557204c45414e44524f25'::bytea, 
> 'LATIN1'::name))"
> "  Filter: (tipo_hawb_id = ANY 
> ('{1,10,3}'::integer[]))"
> "  Heap Blocks: exact=4"
> "  ->  Bitmap Index Scan on idx_nome_des  
> (cost=0.00..1052.98 rows=22623 width=0) (actual time=201.942..201.943 rows=4 
> loops=1)"
> "Index Cond: ((nome_des)::text 
> ~~* convert_from('\x255354455048414e592053544f4557204c45414e44524f25'::bytea, 
> 'LATIN1'::name))"

For starters, how in the world did you get that query condition out of

> where h.nome_des ilike '%STEPHANY STOEW LEANDRO%'

?  What data type is h.nome_des, anyway?  And what kind of index
is that --- it couldn't be a plain btree, because we wouldn't consider
~~* to be indexable by a btree.

However, the long and the short of it is that this rowcount estimate
is off by nearly four orders of magnitude (21451 estimated vs. 4
actual is pretty awful).  It's probably just luck that you got an
acceptable plan out of v10, and bad luck that you didn't get one
out of v13 --- v13's estimate is not better, but it's not much
worse either.  You need to do something about improving that
estimate if you'd like reliable query planning.  Since I'm not
too sure which operator you're actually invoking, it's hard to
offer good advice about how hard that might be.

regards, tom lane




Re: Problem with indices from 10 to 13

2021-09-28 Thread Ranier Vilela
Em ter., 28 de set. de 2021 às 12:40, Daniel Diniz <
dan...@flashcourier.com.br> escreveu:

> Hello I migrated from postgres 10 to 13 and I noticed that there was a big
> increase in a querie that I use, I did explain in 10 and 13 and the
> difference is absurd, the indices and data are the same in 2. I've
> re-created and re-indexed but I don't know what changed from 10 to 13 which
> made the performance so bad, I don't know if it needs some extra parameter
> in some conf on 13.
>
> Postgres 13
>
> "QUERY PLAN"
> "Limit  (cost=1.13..26855.48 rows=30 width=137) (actual
> time=10886.585..429803.463 rows=4 loops=1)"
> "  ->  Nested Loop  (cost=1.13..19531164.71 rows=21819 width=137) (actual
> time=10886.584..429803.457 rows=4 loops=1)"
> "Join Filter: (h.ult_eve_id = ev.evento_id)"
> "Rows Removed by Join Filter: 252"
> "->  Nested Loop  (cost=1.13..19457514.32 rows=21819 width=62)
> (actual time=10886.326..429803.027 rows=4 loops=1)"
> "  ->  Nested Loop  (cost=0.85..19450780.70 rows=21819
> width=55) (actual time=10886.259..429802.908 rows=4 loops=1)"
> "->  Index Scan Backward using hawbs_pkey on hawbs h
>  (cost=0.57..19444209.67 rows=21819 width=46) (actual
> time=10886.119..429802.676 rows=4 loops=1)"
> "  Filter: ((tipo_hawb_id = ANY
> ('{1,10,3}'::integer[])) AND ((nome_des)::text ~~*
> convert_from('\x255354455048414e592053544f4557204c45414e44524f25'::bytea,
> 'LATIN1'::name)))"
> "  Rows Removed by Filter: 239188096"
>
Index Scan Backward looks suspicious to me.
239,188,096  rows removed by filter it's a lot of work.

Do you, run analyze?

regards,
Ranier Vilela


Re: hashjoins, index loops to retrieve pk/ux constrains in pg12

2021-09-28 Thread Tom Lane
Arturas Mazeika  writes:
> Thanks a lot for having a look at the query once again in more detail. In
> short, you are right, I fired the liquibase scripts and observed the exact
> query that was hanging in pg_stats_activity. The query was:

> SELECT
>   FK.TABLE_NAME   as "TABLE_NAME"
>   , CU.COLUMN_NAMEas "COLUMN_NAME"
>   , PK.TABLE_NAME as "REFERENCED_TABLE_NAME"
>   , PT.COLUMN_NAMEas "REFERENCED_COLUMN_NAME"
>   , C.CONSTRAINT_NAME as "CONSTRAINT_NAME"
> FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS C
> INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS FK ON
> C.CONSTRAINT_NAME = FK.CONSTRAINT_NAME
> INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS PK ON
> C.UNIQUE_CONSTRAINT_NAME = PK.CONSTRAINT_NAME
> INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE CU ON C.CONSTRAINT_NAME
> = CU.CONSTRAINT_NAME
> INNER JOIN (
>   SELECT
>   i1.TABLE_NAME
>   , i2.COLUMN_NAME
>   FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS i1
>   INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE i2 ON
> i1.CONSTRAINT_NAME = i2.CONSTRAINT_NAME
>   WHERE i1.CONSTRAINT_TYPE = 'PRIMARY KEY'
> ) PT ON PT.TABLE_NAME = PK.TABLE_NAME WHERE
> lower(FK.TABLE_NAME)='secrole_condcollection'

TBH, before worrying about performance you should be worrying about
correctness.  constraint_name alone is not a sufficient join key
for these tables, so who's to say whether you're even getting the
right answers?

Per SQL spec, the join key to use is probably constraint_catalog
plus constraint_schema plus constraint_name.  You might say you
don't need to compare constraint_catalog because that's fixed
within any one Postgres database, and that observation would be
correct.  But you can't ignore the schema.

What's worse, the SQL-spec join keys are based on the assumption that
constraint names are unique within schemas, which is not enforced in
Postgres.  Maybe you're all right here, because you're only looking
at primary key constraints, which are associated with indexes, which
being relations do indeed have unique-within-schema names.  But you
still can't ignore the schema.

On the whole I don't think you're buying anything by going through
the SQL-spec information views, because this query is clearly pretty
dependent on Postgres-specific assumptions even if it looks like it's
portable.  And you're definitely giving up a lot of performance, since
those views have so many complications from trying to map the spec's
view of whats-a-constraint onto the Postgres objects (not to mention
the spec's arbitrary opinions about which objects you're allowed to
see).  This query would be probably be simpler, more correct, and a
lot faster if rewritten to query the Postgres catalogs directly.

regards, tom lane