Re: [HACKERS] 8.3 full text search docs

Albert Cervera i Areny Sat, 13 Oct 2007 15:28:35 -0700

Andy,
note that documentation is discussed in the pgsql-docs list and patches 
usually are submitted to the pgsql-patches list. Nice to see both new 
sections, by the way.


A Diumenge 14 Octubre 2007, andy va escriure:
> I have two doc updates I'd like to offer.  I see we have two example
> sections: creating rule-based dict's and creating parsers.  When I was
> starting I would have liked to see an example usage.
>
> I'd like to offer: "example usage" and "Upgrading".
> This is my first draft, if anyone has suggestions I'd be interested in
> hearing them.  Also, I'm not sure where or who to send this to, so I
> hope -hacker is ok.
>
> ----- Example Usage -----
> Staring a new project with Full Text Searching is easy.  There is
> nothing to install anymore, its all built in (in fact, don't install the
> contrib module tsearch2 because it will conflict with the tsearch2 built
> into the core).
>
> We need to add a new column of type tsvector to the table you'd like to
> search.  In this example we'll use a table called notes.  If your table
> exists use:
>
> alter table notes add searchvec tsvector;
>
> If not use:
>
> create table notes (
>       rowid integer,
>       note text,
>       searchvec tsvector
> );
>
> The searchvec column is what we will use for searching, so you probably
> want to create an index on it... from another place in the manual:
>
> (http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html)
>
> GiST indexes are very good for dynamic data and fast if the number of
> unique words (lexemes) is under 100,000,
> GIN indexes are best for static data with +100,000 lexemes.
>
> For our example, I'll create a gist index:
>
> create index notesvec on notes using gist(searchvec);
>
> If you have existing data, we need to fill the searchvec column:
>
> update notes set searchvec = to_tsvector(note);
>
> After the update, any rows are inserted or updated will not have their
> searchvec column set automatically, for that we need to create a trigger:
>
> create trigger notevecupdate
> before insert or update on notes
> for each row
> execute procedure
>    tsvector_update_trigger(searchvec, 'pg_catalog.english', note);
>
>
> Some data:
> insert into notes(rowid, note) values(1, 'this is a test');
>
> insert into notes(rowid, note)
> values(2, 'I do not like green eggs and ham');
>
> insert into notes(rowid, note) values(3, 'the cat in the hat');
>
> insert into notes(rowid, note)
> values(4, 'rage against the dying of the light');
>
> And now we can query it:
>
> select * from notes where searchvec @@ to_tsquery('light');
>
> or
>
> select * from notes, to_tsquery('test') as q where searchvec @@ q;
>
> writing it this way lets you reuse the tsquery "q" like this:
>
> select note, ts_rank(searchvec, q)
> from notes, to_tsquery('test') as q
> where searchvec @@ q
> order by ts_rank(searchvec, q);
> ----- Example Usage -----
>
>
>
> ----- Upgrade from prior versions -----
>
> When tsearch2 was put into core, some functions and types were renamed,
> among other things.  A simple backup and restore will not work to
> migrate your database from versions below 8.3 to 8.3.
>
> In general, the way to do it is backup all your data without the
> tsearch2 stuff, restore just the data, then recreate the tsearch2 stuff
> by hand.
>
> (Its easier to think of this as an upgrade from tsearch2 to tsearch3,
> but without the whole renaming it to tsearch3 thing)
>
> To make it a little easier, there is a way using the pg_restore to
> selective restore everything that is not in the old tsearch2.
>
> First you must use "pg_dump -Fc" to backup your existing database.
>
> Then we will create an item list of things we want to restore using this
> perl script.  It will strip out all the things that look like tsearch2,
> and return (to stdout) a list of things you should restore.
>
> For example:
>
> pg_dump -Fc -h oldserver -f ubberbase.bak ubberbase
> perl maketoc.pl ubberbase.bak > toc
> # now restore just the items in toc
> pg_restore -Fc --create -d postgres --use-list toc -f ubberbse.bak
>
> There is one thing that will fail, that's the trigger you used to update
> the tsvector column.  Its because the function tsearch2 was renamed to
> tsvector_update_trigger.  You'll need to recreate the trigger by hand.
>
> Now that the structures and data are restored, you'll need to go through
> and redo the tsearch2 stuff by hand.
>
> After you get the database fixed up, you'll also need to update your
> client programs (php, perl, etc).  For the most part, just renameing
> things (like rank to ts_rank) should be all that's required.
>
>
> Oleg Bartunov has an incomplete list of items that have been renamed:
>
> http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes
>
> ----- Upgrade from prior versions -----



-- 
Albert Cervera i Areny
http://www.NaN-tic.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: [HACKERS] 8.3 full text search docs

Reply via email to