On Fri, 2023-06-23 at 12:08 +0000, Nicolas Seinlet wrote:
> we faced an issue with a select query on a relatively large table on our 
> database.
> The query involves one single table. The table has more than 10 million 
> records.
> It's mainly composed of varchar fields, have a primary key (id) of type 
> serial,
> and when records of this table are shown to users, they are sorted users 2 
> fields,
> display_name (varchar) and id (the primary key). Because this table is 
> heavily used
> in various contexts in our application, we have multiple indexes on it. Among 
> other
> index, we have gin index on some fields of the table.
> 
> The btree index res_partner_displayname_id_idx have been added lately and 
> perfectly
> match a criteria (where active) and sorting (display_name, id) we have in 
> quite all
> our queries on this table.
> 
> The query that cause the issue is this one:
> SELECT "res_partner"."id"
>   FROM "res_partner"
>  WHERE (("res_partner"."active" = true) AND
>          (
>          (
>            (
>              ((unaccent("res_partner"."display_name"::text) ilike 
> unaccent('%nse%'))
>            OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%')))
>         OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%')))
>      OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%')))
>    OR (unaccent("res_partner"."company_registry"::text) ilike 
> unaccent('%nse)%'))))
> 
>  AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) 
> OR "res_partner"."type" IS NULL )
> 
> ORDER BY "res_partner"."display_name" ,"res_partner"."id"  
>    LIMIT 100
> 
> We have the common criteria  (active=true), the common sorting, a limit, and 
> a search
> on various fields. The fields on which we're searching with criteria like '% 
> whatever%' are gin indexed.
> 
>  Here is the query plan:
>  Limit  (cost=0.56..10703.36 rows=100 width=25) (actual 
> time=56383.794..86509.036 rows=1 loops=1)
>    Output: id, display_name
>    Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
>    ->  Index Scan using res_partner_displayname_id_idx on public.res_partner  
> (cost=0.56..1200212.37 rows=11214 width=25) (actual time=56383.793..86509.022 
> rows=1 loops=1)
>          Output: id, display_name
>          Filter: ((((res_partner.type)::text <> 'private'::text) OR 
> (res_partner.type IS NULL) OR (res_partner.type IS NULL)) AND 
> ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR
> (unaccent((res_partner.email)::text) ~~
> * '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR 
> (unaccent((res_partner.vat)::text) ~~* '%nse%'::text) OR 
> (unaccent((res_partner.company_registry)::text) ~~*
> '%nse%'::text)))
>          Rows Removed by Filter: 6226870
>          Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
>  Planning Time: 0.891 ms
>  Execution Time: 86509.070 ms
> (10 rows)
> 
> It's not using our gin index at all, but the btree one.

The problem is that PostgreSQL estimates that the index scan will return 11214
rows, when it is actually one.  This makes the plan to scan the table using
an index that matches the ORDER BY clause appealing: we might find 100 rows
quickly and avoid a sort.

You can try to improve the estimates with more detailed statistics,
but if that doesn't do the job, you can modify the ORDER BY clause so
that it cannot use the bad index:

  ORDER BY res_partner.display_name ,res_partner.id + 0

Yours,
Laurenz Albe


Reply via email to