Re: [HACKERS] How to implement Gin method?

2013-07-08 Thread kenji uno
Hi.

Ok, ok thanks.

My problem is to shorten time of searching full text stored in text field.

The table definition is like following:

 CREATE TABLE xxx
 (
 ...
   title character varying,
 ...
   fts1body text,
 ...
 )

If user requests keywords, we use a kind of stristr that is targeting 
Japanese text encoded in UTF-8.

aaa bbb ccc [Click here to search!]

 SELECT * FROM xxx 
 WHERE TRUE
 AND (ddstrike(title,'aaa') OR ddstrike(fts1body,'aaa') OR ...)
 AND (ddstrike(title,'bbb') OR ddstrike(fts1body,'bbb') OR ...)
 AND ...

As you can imagine easily, yes, it is very slow!

So I need trial and error for speeding up.

My trial is Insert a light weight filter done by integer key, before text 
searching!

For example,
 filter('A') - 1
 filter('B') - 2
 filter('C') - 4
 filter('AAABBC') - 7 or {1,2,4}

It may fit to inverse index like GIN!

So I began to study GIN.

I'm sorry to say. Today I found I could apply int array GIN support at 
contrib/_int.sql.

I made GIN index.

 CREATE INDEX xxx_idx_filter ON xxx USING GIN (filter(fts1body) gist__int_ops);


The following sample query is very very fast! 11065 hits in 22 milli secs 
(total 215,278 records).

 SELECT COUNT(*) FROM xxx WHERE filter(fts1body) @ filter('ABC');


However the following query is very slow! 9,400ms. It uses Seq Scan lol.

 SELECT * FROM xxx 
 WHERE TRUE
 AND (ddstrike(title,'ABC') OR (filter(fts1body) @ filter('AAA') AND 
ddstrike(fts1body,'AAA')))


Apply filter to title column too.

The best query result costs 3,700ms. 18 hits. It surely uses expected query 
plan: two Bitmap index scan - Bitmap Or - Bitmap Heap Scan.

 SELECT * FROM xxx 
 WHERE TRUE
 AND (filter(title) @ filter('ABC') OR filter(fts1body) @ filter('ABC')) 
 AND (ddstrike(title,'ABC') OR ddstrike(fts1body,'ABC'))


The pure query costs 3,800ms. 18 hits. Single Seq Scan.

 SELECT * FROM xxx
 WHERE TRUE
 AND (ddstrike(title,'ABC') OR ddstrike(fts1body,'ABC'))


Finally I noticed I had spent useless time, and need to find another good one.

Sorry.


However, I may think good idea which uses inverted index.

So I want to know...
- the actual work of extractQuery and consistant.
- the detail interface of extractValue/extractQuery/consistant. It may help 
understanding.


I looked at contrib/_int.sql of PG8.2.22

There are definitions of int[] GIN support.

---
CREATE OPERATOR CLASS gin__int_ops
FOR TYPE _int4 USING gin
AS
OPERATOR3   ,
OPERATOR6   = (anyarray, anyarray)  RECHECK,
OPERATOR7   @,
OPERATOR8   @  RECHECK,
OPERATOR13  @,
OPERATOR14  ~   RECHECK,
OPERATOR20  @@ (_int4, query_int),
FUNCTION1   btint4cmp (int4, int4),
FUNCTION2   ginarrayextract (anyarray, internal),
FUNCTION3   ginint4_queryextract (internal, internal, int2),
FUNCTION4   ginint4_consistent (internal, int2, internal),
STORAGE int4;
---

I checked the PG8.2.22 source code.

Both ginint4_queryextract and ginint4_consistent assume that query argument 
is a PGARRAY (ArrayType *). Where is it decided? Is it array of STORAGE type?

Both extractQuery(ginint4_queryextract) and extractValue(ginarrayextract) seem 
to return similar value type. They return Datum array of int4. Is it array of 
STORAGE type?

I want to understand the overview of GIN extension.

Thanks

kenji uno



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to implement Gin method?

2013-07-06 Thread Kenji uno
Hi.

I want to try GIN and know programming information of GIN technology.

Please teach me about these functions extractValue, extractQuery and consistent.

I have posted question at stack overflow.

http://stackoverflow.com/questions/17489703/postgresql-how-to-implement-gin

Please help my question.

Thanks

Kenji uno

Windows Phoneから送信