Hi.
Ok, ok thanks.
My problem is to shorten time of searching full text stored in text field.
The table definition is like following:
CREATE TABLE xxx
(
...
title character varying,
...
fts1body text,
...
)
If user requests keywords, we use a kind of stristr that is targeting
Japanese text encoded in UTF-8.
aaa bbb ccc [Click here to search!]
SELECT * FROM xxx
WHERE TRUE
AND (ddstrike(title,'aaa') OR ddstrike(fts1body,'aaa') OR ...)
AND (ddstrike(title,'bbb') OR ddstrike(fts1body,'bbb') OR ...)
AND ...
As you can imagine easily, yes, it is very slow!
So I need trial and error for speeding up.
My trial is Insert a light weight filter done by integer key, before text
searching!
For example,
filter('A') - 1
filter('B') - 2
filter('C') - 4
filter('AAABBC') - 7 or {1,2,4}
It may fit to inverse index like GIN!
So I began to study GIN.
I'm sorry to say. Today I found I could apply int array GIN support at
contrib/_int.sql.
I made GIN index.
CREATE INDEX xxx_idx_filter ON xxx USING GIN (filter(fts1body) gist__int_ops);
The following sample query is very very fast! 11065 hits in 22 milli secs
(total 215,278 records).
SELECT COUNT(*) FROM xxx WHERE filter(fts1body) @ filter('ABC');
However the following query is very slow! 9,400ms. It uses Seq Scan lol.
SELECT * FROM xxx
WHERE TRUE
AND (ddstrike(title,'ABC') OR (filter(fts1body) @ filter('AAA') AND
ddstrike(fts1body,'AAA')))
Apply filter to title column too.
The best query result costs 3,700ms. 18 hits. It surely uses expected query
plan: two Bitmap index scan - Bitmap Or - Bitmap Heap Scan.
SELECT * FROM xxx
WHERE TRUE
AND (filter(title) @ filter('ABC') OR filter(fts1body) @ filter('ABC'))
AND (ddstrike(title,'ABC') OR ddstrike(fts1body,'ABC'))
The pure query costs 3,800ms. 18 hits. Single Seq Scan.
SELECT * FROM xxx
WHERE TRUE
AND (ddstrike(title,'ABC') OR ddstrike(fts1body,'ABC'))
Finally I noticed I had spent useless time, and need to find another good one.
Sorry.
However, I may think good idea which uses inverted index.
So I want to know...
- the actual work of extractQuery and consistant.
- the detail interface of extractValue/extractQuery/consistant. It may help
understanding.
I looked at contrib/_int.sql of PG8.2.22
There are definitions of int[] GIN support.
---
CREATE OPERATOR CLASS gin__int_ops
FOR TYPE _int4 USING gin
AS
OPERATOR3 ,
OPERATOR6 = (anyarray, anyarray) RECHECK,
OPERATOR7 @,
OPERATOR8 @ RECHECK,
OPERATOR13 @,
OPERATOR14 ~ RECHECK,
OPERATOR20 @@ (_int4, query_int),
FUNCTION1 btint4cmp (int4, int4),
FUNCTION2 ginarrayextract (anyarray, internal),
FUNCTION3 ginint4_queryextract (internal, internal, int2),
FUNCTION4 ginint4_consistent (internal, int2, internal),
STORAGE int4;
---
I checked the PG8.2.22 source code.
Both ginint4_queryextract and ginint4_consistent assume that query argument
is a PGARRAY (ArrayType *). Where is it decided? Is it array of STORAGE type?
Both extractQuery(ginint4_queryextract) and extractValue(ginarrayextract) seem
to return similar value type. They return Datum array of int4. Is it array of
STORAGE type?
I want to understand the overview of GIN extension.
Thanks
kenji uno
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers