Alban Hertroys <[email protected]> wrote:
>> What is the output of explain?
>>
>> You say 'the other table', so presumably we're dealing with a foreign key
>> here. Is there an index on that column?
Albe Laurenz wrote:
>> Is the index used for "where code ~ '^ABC3563'"?
>>
>> If not, then the result is fast only because the table is scanned only once,
>> and it's just the factor of 3000 that's killing you.
>>
>> The second query (where code ~ wantcode) can never use an index because
>> the pattern "wantcode" is unknown at query planning time.
>>
>> Yours,
>> Laurenz Albe
Here I created a subset (just number and code matching a certain prefix)
\d items
Table "pg_temp_1.items"
Column | Type | Modifiers
--------+-----------------------+-----------
num | integer |
code | character varying(40) |
create index itemsc on items (code);
select count(*) from items;
count
-------
9614
A single anchored query
select * from items where code ~ '^ABC';
does indeed use the index to retrieve data.
Next I copied a file of wanted codes
create temp table n (wantcode text);
\copy n from /tmp/rmartin.tmp
the file contains plain names, i.e. unanchored matches
explain analyze select num, n.wantcode from items, n where items.code ~
n.wantcode;
Nested Loop (cost=20.00..216502.14 rows=48070 width=36) (actual
time=148.479..336280.488 rows=2871 loops=1)
Join Filter: (("outer".code)::text ~ "inner".wantcode)
-> Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual
time=0.048..38.666 rows=9614 loops=1)
-> Materialize (cost=20.00..30.00 rows=1000 width=32) (actual
time=0.001..1.049 rows=815 loops=9614)
-> Seq Scan on n (cost=0.00..20.00 rows=1000 width=32) (actual
time=0.003..1.839 rows=815 loops=1)
Total runtime: 336286.692 ms
An exact match "where items.code = n.wantcode" on the same data completes in
40 ms
BTW: indexing the second table does not affect the query plan or the runtime,
it just shows
actual row count rather than estimate.
This is, of course, bad; an anchored match could be faster and also is more
appropriate
to the scenario. So I change the contents of the second table
update n set wantcode = textcat('^', wantcode);
and try again, with similar results
Nested Loop (cost=14.15..176478.01 rows=39178 width=36) (actual
time=125.114..308831.697 rows=2871 loops=1)
Join Filter: (("outer".code)::text ~ "inner".wantcode)
-> Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual
time=0.061..2034.572 rows=9614 loops=1)
-> Materialize (cost=14.15..22.30 rows=815 width=32) (actual
time=0.001..1.095 rows=815 loops=9614)
-> Seq Scan on n (cost=0.00..14.15 rows=815 width=32) (actual
time=0.114..1.893 rows=815 loops=1)
Total runtime: 308837.746 ms
I am aware that this is unlikely to work fast (the planner would perhaps need a
hint in the query
rather than in the data column to choose an anchored match algorithm (in case
there is
such an algo, of course)
So I wonder whether there might be a different approach to this problem rather
than
pattern matching.
I recall I had a similar problem before with a "contacts" column possibly
containing one or more
email addresses. Here searches would also be number of people times number of
requests
performance. I finally ended up with a @@ match (contrib/tsquery) and a
supporting GIST index,
but that only supports exact match, not prefix
Regards
Wolfgang Hamann
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general