This adds mention of my latest tweak to the tsearch2/pg_trgm integration. It is much better to create a word list of unstemmed words than stemmed ones.

Chris
Index: contrib/pg_trgm/README.pg_trgm
===================================================================
RCS file: /projects/cvsroot/pgsql/contrib/pg_trgm/README.pg_trgm,v
retrieving revision 1.1
diff -c -r1.1 README.pg_trgm
*** contrib/pg_trgm/README.pg_trgm      31 May 2004 17:18:11 -0000      1.1
--- contrib/pg_trgm/README.pg_trgm      26 Nov 2004 01:31:39 -0000
***************
*** 100,110 ****
        The first step is to generate an auxiliary table containing all
        the unique words in the Tsearch2 index:
  
!       CREATE TABLE words AS 
!               SELECT word FROM stat('SELECT vector FROM documents');
  
!       Where 'documents' is the table that contains the Tsearch2 index
!       column 'vector', of type 'tsvector'.
  
        Next, create a trigram index on the word column:
  
--- 100,114 ----
        The first step is to generate an auxiliary table containing all
        the unique words in the Tsearch2 index:
  
!       CREATE TABLE words AS SELECT word FROM
!               stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
  
!       Where 'documents' is a table that has a text field 'bodytext'
!       that TSearch2 is used to search.  The use of the 'simple' dictionary
!       with the to_tsvector function, instead of just using the already
!       existing vector is to avoid creating a list of already stemmed
!       words.  This way, only the original, unstemmed words are added
!       to the word list.
  
        Next, create a trigram index on the word column:
  
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html

Reply via email to