Re: [PERFORM] For full text indexing, which is better, tsearch2 or

Steve Atkins Thu, 27 Nov 2003 22:24:10 -0800

On Wed, Nov 26, 2003 at 09:12:30PM -0800, Steve Atkins wrote:
> On Thu, Nov 27, 2003 at 12:41:59PM +0800, Christopher Kings-Lynne wrote:
> > >Does anyone have any metrics on how fast tsearch2 actually is?
> > >
> > >I tried it on a synthetic dataset of a million documents of a hundred
> > >words each and while insertions were impressively fast I gave up on
> > >the search after 10 minutes.
> > >
> > >Broken? Unusable slow? This was on the last 7.4 release candidate.
> > 
> > I just created a 1.1million row dataset by copying one of our 30000 row 
> > production tables and just taking out the txtidx column.  Then I 
> > inserted it into itself until it had 1.1 million rows.
> > 
> > Then I created the GiST index - THAT took forever - seriously like 20 
> > mins or half an hour or something.
> > 
> > Now, to find a word:
> > 
> > select * from tsearchtest where ftiidx ## 'curry';
> > Time: 9760.75 ms
> 
> > So, I have no idea why you think it's slow?  Perhaps you forgot the 
> > 'create index using gist' step?
> 
> No, it was indexed.
> 
> Thanks, that was the datapoint I was looking for. It _can_ run fast, so
> I just need to work out what's going on. (It's hard to diagnose a slow
> query when you've no idea whether it's really 'slow').


Looking at it further, something is very broken, possibly with GIST
indices, possibly with tsearch2s use of 'em.

This is on a newly built 7.4 installation, built with 64 bit
datetimes, but completely stock other than that. Stock gcc 3.3.2,
Linux, somewhat elderly 2.4.18 kernel. Running on a 1.5GHz single
processor Athlon with a half gig of RAM. Configuration set to use 20%
of RAM as shared buffers (amongst other settings, this was the last of
a range I tried looking for variation).

Software RAID0 across two 7200RPM SCSI drives, reiserfs (it's a
development box, not a production system). System completely idle
apart from postgresql.

269000 rows, each row having 400 words. Analyzed.

Running the select query given below appears to pause a process trying
to insert into the table completely (locking issue? I/O bandwidth?).

top shows the select below consuming <2% of CPU and iostat shows it reading
~2800 blocks/second from each of the two RAID drives.

Physical size of the database is under 3 gigs, including toast and index
tables.

The select query takes around 6 minutes (consistently, even if the same
identical query is repeated).

For entertainment, I turned off indexscan and the query takes 1
minute with a simple seqscan.

Any thoughts?

Cheers,
  Steve

=> select count(*) from ftstest;
 count  
--------
 269000
(1 row)

=> \d ftstest
                            Table "public.ftstest"
 Column |   Type   |                        Modifiers                         
--------+----------+----------------------------------------------------------
 idx    | integer  | not null default nextval('public.ftstest_idx_seq'::text)
 words  | text     | not null
 idxfti | tsvector | not null
Indexes:
    "ftstest_idx" gist (idxfti)

=> explain analyze select idx from ftstest where idxfti @@ 'dominican'::tsquery;
                                                            QUERY PLAN                 
                                            
-----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ftstest_idx on ftstest  (cost=0.00..515.90 rows=271 width=4) (actual 
time=219.694..376042.428 rows=4796 loops=1)
   Index Cond: (idxfti @@ '\'dominican\''::tsquery)
   Filter: (idxfti @@ '\'dominican\''::tsquery)
 Total runtime: 376061.541 ms
(4 rows)


((Set enable_indexscan=false))


=> explain analyze select idx from ftstest where idxfti @@ 'dominican'::tsquery;
                                                  QUERY PLAN                           
                       
--------------------------------------------------------------------------------------------------------------
 Seq Scan on ftstest  (cost=0.00..5765.88 rows=271 width=4) (actual 
time=42.589..62158.285 rows=4796 loops=1)
   Filter: (idxfti @@ '\'dominican\''::tsquery)
 Total runtime: 62182.277 ms
(3 rows)

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Re: [PERFORM] For full text indexing, which is better, tsearch2 or

Reply via email to