Hi,

After the release of the PostgreSQL 18 version, I am trying non
determisinstic collation with LIKE pattern matching support. I am mostly
searching with "LIKE %search_term%" on about 10 text columns. As I use
wildcard prefix and suffix, I can't use btree index anyways. So I decided
to try non deterministic collation support so I can simplify application
code. I am testing this on a table with ~60K rows. With this row count and
search pattern, non deterministic collation seems at least 10 times slower
than LOWER LIKE and ILIKE. Tested collations are the same ICU tr-TR
collation with standard and custom based on same ICU tr-TR:
40960 | test3e                 |          2200 |        10 | i            |
f                   |           -1 |             |            |
tr-TR-u-ks-level2 |              | 153.128.46

Version:
etukimlik_admin=# select version();
                                                      version

--------------------------------------------------------------------------------------------------------------------
 PostgreSQL 18.1 (Debian 18.1-1.pgdg13+2) on x86_64-pc-linux-gnu, compiled
by gcc (Debian 14.2.0-19) 14.2.0, 64-bit
(1 row)

Table  structure with ICU deterministic collation:
etukimlik_admin=# \d "AccountActivityLogEntries"
                                                        Table
"public.AccountActivityLogEntries"
          Column          |           Type           |  Collation  |
Nullable |                                 Default

--------------------------+--------------------------+-------------+----------+--------------------------------------------------------------------------
 Id                       | integer                  |             | not
null | generated by default as identity
 Email                    | character varying(128)   | tr-TR-x-icu | not
null |
 LdapUid                  | character varying(512)   | tr-TR-x-icu |
   |
 Identity                 | character varying(24)    | tr-TR-x-icu |
   |
 StudentId                | text[]                   |             |
   |
 EmployeeId               | character varying(24)    | tr-TR-x-icu |
   |
 Name                     | character varying(1024)  | tr-TR-x-icu |
   |
 Surname                  | character varying(1024)  | tr-TR-x-icu |
   |
 AccountType              | text[]                   |             | not
null |
 Description              | character varying(4096)  | tr-TR-x-icu | not
null |
 ActivityId               | character varying(128)   | tr-TR-x-icu | not
null |
 LogTypeCode              | character varying(10)    | tr-TR-x-icu | not
null |
 LogType                  | character varying(64)    | tr-TR-x-icu | not
null |
 LogSubType               | character varying(64)    | tr-TR-x-icu | not
null |
 ActivityTime             | timestamp with time zone |             |
   |
 ActivitySource           | character varying(128)   | tr-TR-x-icu | not
null |
 ChangedAttribute         | character varying(128)   | tr-TR-x-icu |
   |
 ChangedAttributeOldValue | character varying(4096)  | tr-TR-x-icu |
   |
 ChangedAttributeNewValue | character varying(4096)  | tr-TR-x-icu |
   |
 DisplayName              | character varying(2049)  | tr-TR-x-icu |
   | generated always as (((("Name"::text || ' '::text) ||
"Surname"::text)))
Indexes:
    "PK_AccountActivityLogEntries" PRIMARY KEY, btree ("Id")

LOWER LIKE query test:
etukimlik_admin=# explain analyze select * FROM "AccountActivityLogEntries"
where lower("Email") like '%duran%' or lower("LdapUid") like '%duran%' or
lower("Identity") like '%duran%' or lower("Name") like '%duran%' or
lower("Surname") like '%duran%' or lower("DisplayName") like '%duran%' or
lower("Description") like '%duran%' or lower("EmployeeId") like '%duran%'
or exists ( select from unnest("StudentId") AS value  WHERE lower(value)
LIKE '%duran%') or  lower("ActivityId") like '%duran%';
QUERY PLAN



-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on "AccountActivityLogEntries"  (cost=0.00..16387.18 rows=31210
width=5251) (actual time=1.952..282.090 rows=102.00 loops=1)
   Filter: ((lower(("Email")::text) ~~ '%duran%'::text) OR
(lower(("LdapUid")::text) ~~ '%duran%'::text) OR (lower(("Identity")::text)
~~ '%duran%'::text) OR (lower(("Name")::text) ~~ '%duran%'::text) OR
(lower(("Surname")::text) ~~ '%duran%'::text) OR (lower((((("Name")::text
|| ' '::text) || ("Surname")::text))::text) ~~ '%duran%'::text) OR
(lower(("Description")::text) ~~ '%duran%'::text) OR
(lower(("EmployeeId")::text) ~~ '%duran%'::text) OR EXISTS(SubPlan 1) OR
(lower(("ActivityId")::text) ~~ '%duran%'::text))
   Rows Removed by Filter: 61438
   Buffers: shared hit=2922
   SubPlan 1
     ->  Function Scan on unnest value  (cost=0.00..0.15 rows=1 width=0)
(actual time=0.001..0.001 rows=0.00 loops=61438)
           Filter: (lower(value) ~~ '%duran%'::text)
           Rows Removed by Filter: 1
 Planning Time: 0.165 ms
 Execution Time: 282.143 ms
(10 rows)

ILIKE query test:
explain analyze select * FROM "AccountActivityLogEntries" where "Email"
ilike '%duran%' or "LdapUid" ilike '%duran%' or "Identity" ilike '%duran%'
or "Name" ilike '%duran%' or "Surname" ilike '%duran%' or "DisplayName"
ilike '%duran%' or "Description" ilike '%duran%' or "EmployeeId" ilike
'%duran%' or exists ( select from unnest("StudentId") AS value  WHERE value
iLIKE '%duran%') or  "ActivityId" ilike '%duran%';
QUERY PLAN



-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on "AccountActivityLogEntries"  (cost=0.00..13464.03 rows=31210
width=5251) (actual time=3.782..417.573 rows=102.00 loops=1)
   Filter: ((("Email")::text ~~* '%duran%'::text) OR (("LdapUid")::text ~~*
'%duran%'::text) OR (("Identity")::text ~~* '%duran%'::text) OR
(("Name")::text ~~* '%duran%'::text) OR (("Surname")::text ~~*
'%duran%'::text) OR ((((("Name")::text || ' '::text) ||
("Surname")::text))::text ~~* '%duran%'::text) OR (("Description")::text
~~* '%duran%'::text) OR (("EmployeeId")::text ~~* '%duran%'::text) OR
EXISTS(SubPlan 1) OR (("ActivityId")::text ~~* '%duran%'::text))
   Rows Removed by Filter: 61438
   Buffers: shared hit=2922
   SubPlan 1
     ->  Function Scan on unnest value  (cost=0.00..0.13 rows=1 width=0)
(actual time=0.001..0.001 rows=0.00 loops=61438)
           Filter: (value ~~* '%duran%'::text)
           Rows Removed by Filter: 1
 Planning Time: 0.160 ms
 Execution Time: 417.652 ms

Table  structure with ICU non deterministic collation:
etukimlik_admin=# \d "AccountActivityLogEntries"
                                                       Table
"public.AccountActivityLogEntries"
          Column          |           Type           | Collation | Nullable
|                                 Default
--------------------------+--------------------------+-----------+----------+--------------------------------------------------------------------------
 Id                       | integer                  |           | not null
| generated by default as identity
 Email                    | character varying(128)   | test3e    | not null
|
 LdapUid                  | character varying(512)   | test3e    |
 |
 Identity                 | character varying(24)    | test3e    |
 |
 StudentId                | text[]                   |           |
 |
 EmployeeId               | character varying(24)    | test3e    |
 |
 Name                     | character varying(1024)  | test3e    |
 |
 Surname                  | character varying(1024)  | test3e    |
 |
 AccountType              | text[]                   |           | not null
|
 Description              | character varying(4096)  | test3e    | not null
|
 ActivityId               | character varying(128)   | test3e    | not null
|
 LogTypeCode              | character varying(10)    | test3e    | not null
|
 LogType                  | character varying(64)    | test3e    | not null
|
 LogSubType               | character varying(64)    | test3e    | not null
|
 ActivityTime             | timestamp with time zone |           |
 |
 ActivitySource           | character varying(128)   | test3e    | not null
|
 ChangedAttribute         | character varying(128)   | test3e    |
 |
 ChangedAttributeOldValue | character varying(4096)  | test3e    |
 |
 ChangedAttributeNewValue | character varying(4096)  | test3e    |
 |
 DisplayName              | character varying(2049)  | test3e    |
 | generated always as (((("Name"::text || ' '::text) || "Surname"::text)))
Indexes:
    "PK_AccountActivityLogEntries" PRIMARY KEY, btree ("Id")

LIKE query test:
etukimlik_admin=# explain analyze select * FROM "AccountActivityLogEntries"
where "Email" like '%duran%' or "LdapUid" like '%duran%' or "Identity" like
'%duran%' or "Name" like '%duran%' or "Surname" like '%duran%' or
"DisplayName" like '%duran%' or "Description" like '%duran%' or
"EmployeeId" like '%duran%' or exists ( select from unnest("StudentId") AS
value  WHERE value LIKE '%duran%') or  "ActivityId" like '%duran%';
QUERY PLAN



--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on "AccountActivityLogEntries"  (cost=0.00..13464.03 rows=31210
width=5251) (actual time=18.466..2338.605 rows=102.00 loops=1)
   Filter: ((("Email")::text ~~ '%duran%'::text) OR (("LdapUid")::text ~~
'%duran%'::text) OR (("Identity")::text ~~ '%duran%'::text) OR
(("Name")::text ~~ '%duran%'::text) OR (("Surname")::text ~~
'%duran%'::text) OR ((((("Name")::text || ' '::text) ||
("Surname")::text))::text ~~ '%duran%'::text) OR (("Description")::text ~~
'%duran%'::text) OR (("EmployeeId")::text ~~ '%duran%'::text) OR
EXISTS(SubPlan 1) OR (("ActivityId")::text ~~ '%duran%'::text))
   Rows Removed by Filter: 61438
   Buffers: shared hit=2922
   SubPlan 1
     ->  Function Scan on unnest value  (cost=0.00..0.13 rows=1 width=0)
(actual time=0.001..0.001 rows=0.00 loops=61438)
           Filter: (value ~~ '%duran%'::text)
           Rows Removed by Filter: 1
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.176 ms
 Execution Time: 2338.670 ms

As you can see, setting deterministic false to the same ICU collation and
querying with "LIKE %search_term%" is way slower than using lower()
function. So why is it this slow? Is there any technical documentation that
explains this behaviour?

Regards,
Rahman Duran

Reply via email to