Re: [HACKERS] PATCH: CITEXT 2.0 v2

David E. Wheeler Mon, 07 Jul 2008 17:48:16 -0700

Thanks to help from RhodiumToad on IRC, I got some things improved here:


On Jul 7, 2008, at 16:24, David E. Wheeler wrote:

So for some reason, after adding the indexes, the queries againstthe CITEXT column aren't using them. Furthermore, the `lower(text)LIKE lower(?)` query isn't using *its* index. Huh?

I never knew what one needed to use the text_pattern_ops operatorclass to index a column for use with LIKE! I had no clue. Would thatwork for a citext column, too, since it's essentially the same as TEXT?

So this leaves me with two questions:
1. For what reason would the query against the citext column *not*use the index?

It turns out that it did use the index if I put `SET enable_seqscan =false;` into my script. So with RhodiumToad's direction, I added some`RESTRICT` and `JOIN` clauses to my comparison operators (copying themfrom ip4r). So now I have:


CREATE OPERATOR = (
    LEFTARG    = CITEXT,
    RIGHTARG   = CITEXT,
    COMMUTATOR = =,
    NEGATOR    = <>,
    PROCEDURE  = citext_eq,
        RESTRICT   = eqsel,
        JOIN       = eqjoinsel,
    HASHES,
    MERGES
);

CREATE OPERATOR <> (
    LEFTARG    = CITEXT,
    RIGHTARG   = CITEXT,
    NEGATOR    = =,
    COMMUTATOR = <>,
    PROCEDURE  = citext_ne,
        RESTRICT   = neqsel,
        JOIN       = neqjoinsel
);

CREATE OPERATOR < (
    LEFTARG    = CITEXT,
    RIGHTARG   = CITEXT,
    NEGATOR    = >=,
    COMMUTATOR = >,
    PROCEDURE  = citext_lt,
        RESTRICT   = scalarltsel,
        JOIN       = scalarltjoinsel
);

CREATE OPERATOR <= (
    LEFTARG    = CITEXT,
    RIGHTARG   = CITEXT,
    NEGATOR    = >,
    COMMUTATOR = <=,
    PROCEDURE  = citext_le,
        RESTRICT   = scalarltsel,
        JOIN       = scalarltjoinsel
);

CREATE OPERATOR >= (
    LEFTARG    = CITEXT,
    RIGHTARG   = CITEXT,
    NEGATOR    = <,
    COMMUTATOR = <=,
    PROCEDURE  = citext_ge,
        RESTRICT   = scalargtsel,
        JOIN       = scalargtjoinsel
);

CREATE OPERATOR > (
    LEFTARG    = CITEXT,
    RIGHTARG   = CITEXT,
    NEGATOR    = <=,
    COMMUTATOR = <,
    PROCEDURE  = citext_gt,
        RESTRICT   = scalargtsel,
        JOIN       = scalargtjoinsel
);

With this change, the index was used:

Loading words from dictionary.
Inserting into the table.

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 261.295 ms
SELECT * FROM try WHERE citext = 'food';
Time: 289.304 ms
Time: 1228.961 ms

Adding indexes...

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 2.018 ms
SELECT * FROM try WHERE citext = 'food';
Time: 0.788 ms

Seems to be faster than the LOWER() version, too, which makes mehappy. The output from EXPLAIN ANALYZE:


try=# EXPLAIN ANALYZE SELECT * FROM try WHERE citext = 'food';

                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------

Index Scan using idx_try_citext on try (cost=0.00..8.31 rows=1width=119) (actual time=0.324..0.324 rows=0 loops=1)

   Index Cond: (citext = 'food'::citext)
 Total runtime: 0.377 ms
(3 rows)

try=# EXPLAIN ANALYZE SELECT * FROM try WHERE LOWER(text) =LOWER('food');


                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on try (cost=28.17..1336.10 rows=500 width=119)(actual time=0.170..0.170 rows=0 loops=1)

   Recheck Cond: (lower(text) = 'food'::text)

-> Bitmap Index Scan on idx_try_text (cost=0.00..28.04 rows=500width=0) (actual time=0.168..0.168 rows=0 loops=1)

         Index Cond: (lower(text) = 'food'::text)
 Total runtime: 0.211 ms
(5 rows)

So my only other question related to this is:

* Are the above RESTRICT and JOIN functions the ones to use, or isthere some way to make use of those used by the TEXT type that wouldbe more appropriate?

2. Is there some way to get the CITEXT index to behave like aLOWER() index, that is, so that its value is stored using the resultof the str_tolower() function, thus removing some of the overhead ofconverting the values for each row fetched from the index? (Doesthis question make any sense?)

Given the performance with an index, I think that this is moot, yes?There is, of course, much more overhead for a table scan.


Best,

David


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: CITEXT 2.0 v2

Reply via email to