Re: [HACKERS] 8.3 full text search docs

2007-10-15 Thread Tom Lane
andy [EMAIL PROTECTED] writes:
 I'd like to offer: example usage and Upgrading.

I did some hacking on your perl script to make it a bit more
bulletproof; I was worried about removing any function named 'concat'
for instance.  Attached is what it looks like now.

I'm not entirely sure what to do with the script --- perhaps we should
put it into contrib/tsearch2 and have it get installed from there?

regards, tom lane

#! /usr/bin/perl

# syntax:  removets2 dumpfile.dump [schema]  newtoc

# The dumpfile must have been made with pg_dump -Fc or -Ft
# If the tsearch2 objects were placed in some schema other than public,
# specify that schema's name as the second argument.
#
# pg_restore must be in your PATH
#
# We use pg_restore to get the dumpfile's table of contents (--list) and
# emit a new table of contents with all the tsearch2 objects stripped out.
# You can then do
#   pg_restore -L newtoc dumpfile.dump ...
# to load your dump into Postgres 8.3.

use strict;

my $dumpfile = $ARGV[0];
my $schema = $ARGV[1] || 'public';

my %list = (
'FUNCTION gtsq_in(cstring)' = 1,
'FUNCTION gtsq_out(gtsq)' = 1,
'TYPE gtsq' = 1,
'FUNCTION gtsvector_in(cstring)' = 1,
'FUNCTION gtsvector_out(gtsvector)' = 1,
'TYPE gtsvector' = 1,
'FUNCTION tsquery_in(cstring)' = 1,
'FUNCTION tsquery_out(tsquery)' = 1,
'TYPE tsquery' = 1,
'FUNCTION tsvector_in(cstring)' = 1,
'FUNCTION tsvector_out(tsvector)' = 1,
'TYPE tsvector' = 1,
'TYPE statinfo' = 1,
'TYPE tokenout' = 1,
'TYPE tokentype' = 1,
'TYPE tsdebug' = 1,
'FUNCTION _get_parser_from_curcfg()' = 1,
'FUNCTION concat(tsvector, tsvector)' = 1,
'FUNCTION dex_init(internal)' = 1,
'FUNCTION dex_lexize(internal, internal, integer)' = 1,
'FUNCTION exectsq(tsvector, tsquery)' = 1,
'FUNCTION get_covers(tsvector, tsquery)' = 1,
'FUNCTION gin_extract_tsquery(tsquery, internal, internal)' = 1,
'FUNCTION gin_extract_tsvector(tsvector, internal)' = 1,
'FUNCTION gin_ts_consistent(internal, internal, tsquery)' = 1,
'FUNCTION gtsq_compress(internal)' = 1,
'FUNCTION gtsq_consistent(gtsq, internal, integer)' = 1,
'FUNCTION gtsq_decompress(internal)' = 1,
'FUNCTION gtsq_penalty(internal, internal, internal)' = 1,
'FUNCTION gtsq_picksplit(internal, internal)' = 1,
'FUNCTION gtsq_same(gtsq, gtsq, internal)' = 1,
'FUNCTION gtsq_union(bytea, internal)' = 1,
'FUNCTION gtsvector_compress(internal)' = 1,
'FUNCTION gtsvector_consistent(gtsvector, internal, integer)' = 1,
'FUNCTION gtsvector_decompress(internal)' = 1,
'FUNCTION gtsvector_penalty(internal, internal, internal)' = 1,
'FUNCTION gtsvector_picksplit(internal, internal)' = 1,
'FUNCTION gtsvector_same(gtsvector, gtsvector, internal)' = 1,
'FUNCTION gtsvector_union(internal, internal)' = 1,
'FUNCTION headline(oid, text, tsquery, text)' = 1,
'FUNCTION headline(oid, text, tsquery)' = 1,
'FUNCTION headline(text, text, tsquery, text)' = 1,
'FUNCTION headline(text, text, tsquery)' = 1,
'FUNCTION headline(text, tsquery, text)' = 1,
'FUNCTION headline(text, tsquery)' = 1,
'FUNCTION length(tsvector)' = 1,
'FUNCTION lexize(oid, text)' = 1,
'FUNCTION lexize(text, text)' = 1,
'FUNCTION lexize(text)' = 1,
'FUNCTION numnode(tsquery)' = 1,
'FUNCTION parse(oid, text)' = 1,
'FUNCTION parse(text, text)' = 1,
'FUNCTION parse(text)' = 1,
'FUNCTION plainto_tsquery(oid, text)' = 1,
'FUNCTION plainto_tsquery(text, text)' = 1,
'FUNCTION plainto_tsquery(text)' = 1,
'FUNCTION prsd_end(internal)' = 1,
'FUNCTION prsd_getlexeme(internal, internal, internal)' = 1,
'FUNCTION prsd_headline(internal, internal, internal)' = 1,
'FUNCTION prsd_lextype(internal)' = 1,
'FUNCTION prsd_start(internal, integer)' = 1,
'FUNCTION querytree(tsquery)' = 1,
'FUNCTION rank(real[], tsvector, tsquery)' = 1,
'FUNCTION rank(real[], tsvector, tsquery, integer)' = 1,
'FUNCTION rank(tsvector, tsquery)' = 1,
'FUNCTION rank(tsvector, tsquery, integer)' = 1,
'FUNCTION rank_cd(real[], tsvector, tsquery)' = 1,
'FUNCTION rank_cd(real[], tsvector, tsquery, integer)' = 1,
'FUNCTION rank_cd(tsvector, tsquery)' = 1,
'FUNCTION rank_cd(tsvector, tsquery, integer)' = 1,
'FUNCTION reset_tsearch()' = 1,
'FUNCTION rewrite(tsquery, text)' = 1,
'FUNCTION rewrite(tsquery, tsquery, tsquery)' = 1,
'FUNCTION rewrite_accum(tsquery, tsquery[])' = 1,
'FUNCTION rewrite_finish(tsquery)' = 1,
'FUNCTION rexectsq(tsquery, tsvector)' = 1,
'FUNCTION set_curcfg(integer)' = 1,
'FUNCTION set_curcfg(text)' = 1,
'FUNCTION set_curdict(integer)' = 1,
'FUNCTION set_curdict(text)' = 1,
'FUNCTION set_curprs(integer)' = 1,
'FUNCTION set_curprs(text)' = 1,
'FUNCTION setweight(tsvector, char)' = 1,
'FUNCTION show_curcfg()' = 1,
'FUNCTION snb_en_init(internal)' = 1,
'FUNCTION snb_lexize(internal, internal, integer)' = 1,
'FUNCTION snb_ru_init_koi8(internal)' = 1,
'FUNCTION snb_ru_init_utf8(internal)' = 1,
'FUNCTION spell_init(internal)' = 1,
'FUNCTION spell_lexize(internal, internal, integer)' = 1,
'FUNCTION stat(text)' = 1,
'FUNCTION stat(text, text)' = 1,
'FUNCTION strip(tsvector)' = 1,
'FUNCTION syn_init(internal)' = 1,

Re: [HACKERS] 8.3 full text search docs

2007-10-15 Thread Oleg Bartunov

On Mon, 15 Oct 2007, Tom Lane wrote:


andy [EMAIL PROTECTED] writes:

I'd like to offer: example usage and Upgrading.


I did some hacking on your perl script to make it a bit more
bulletproof; I was worried about removing any function named 'concat'
for instance.  Attached is what it looks like now.

I'm not entirely sure what to do with the script --- perhaps we should
put it into contrib/tsearch2 and have it get installed from there?


yes, contrib/tsearch2 looks ok. As a bonus we could back-port it to the
previous releases and mention in release notes.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] 8.3 full text search docs

2007-10-13 Thread andy
I have two doc updates I'd like to offer.  I see we have two example 
sections: creating rule-based dict's and creating parsers.  When I was 
starting I would have liked to see an example usage.


I'd like to offer: example usage and Upgrading.
This is my first draft, if anyone has suggestions I'd be interested in 
hearing them.  Also, I'm not sure where or who to send this to, so I 
hope -hacker is ok.


- Example Usage -
Staring a new project with Full Text Searching is easy.  There is 
nothing to install anymore, its all built in (in fact, don't install the 
contrib module tsearch2 because it will conflict with the tsearch2 built 
into the core).


We need to add a new column of type tsvector to the table you'd like to 
search.  In this example we'll use a table called notes.  If your table 
exists use:


alter table notes add searchvec tsvector;

If not use:

create table notes (
rowid integer,
note text,
searchvec tsvector
);

The searchvec column is what we will use for searching, so you probably 
want to create an index on it... from another place in the manual:


(http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html)

GiST indexes are very good for dynamic data and fast if the number of 
unique words (lexemes) is under 100,000,

GIN indexes are best for static data with +100,000 lexemes.

For our example, I'll create a gist index:

create index notesvec on notes using gist(searchvec);

If you have existing data, we need to fill the searchvec column:

update notes set searchvec = to_tsvector(note);

After the update, any rows are inserted or updated will not have their 
searchvec column set automatically, for that we need to create a trigger:


create trigger notevecupdate
before insert or update on notes
for each row
execute procedure
  tsvector_update_trigger(searchvec, 'pg_catalog.english', note);


Some data:
insert into notes(rowid, note) values(1, 'this is a test');

insert into notes(rowid, note)
values(2, 'I do not like green eggs and ham');

insert into notes(rowid, note) values(3, 'the cat in the hat');

insert into notes(rowid, note)
values(4, 'rage against the dying of the light');

And now we can query it:

select * from notes where searchvec @@ to_tsquery('light');

or

select * from notes, to_tsquery('test') as q where searchvec @@ q;

writing it this way lets you reuse the tsquery q like this:

select note, ts_rank(searchvec, q)
from notes, to_tsquery('test') as q
where searchvec @@ q
order by ts_rank(searchvec, q);
- Example Usage -



- Upgrade from prior versions -

When tsearch2 was put into core, some functions and types were renamed, 
among other things.  A simple backup and restore will not work to 
migrate your database from versions below 8.3 to 8.3.


In general, the way to do it is backup all your data without the 
tsearch2 stuff, restore just the data, then recreate the tsearch2 stuff 
by hand.


(Its easier to think of this as an upgrade from tsearch2 to tsearch3, 
but without the whole renaming it to tsearch3 thing)


To make it a little easier, there is a way using the pg_restore to 
selective restore everything that is not in the old tsearch2.


First you must use pg_dump -Fc to backup your existing database.

Then we will create an item list of things we want to restore using this 
perl script.  It will strip out all the things that look like tsearch2, 
and return (to stdout) a list of things you should restore.


For example:

pg_dump -Fc -h oldserver -f ubberbase.bak ubberbase
perl maketoc.pl ubberbase.bak  toc
# now restore just the items in toc
pg_restore -Fc --create -d postgres --use-list toc -f ubberbse.bak

There is one thing that will fail, that's the trigger you used to update 
the tsvector column.  Its because the function tsearch2 was renamed to 
tsvector_update_trigger.  You'll need to recreate the trigger by hand.


Now that the structures and data are restored, you'll need to go through 
and redo the tsearch2 stuff by hand.


After you get the database fixed up, you'll also need to update your 
client programs (php, perl, etc).  For the most part, just renameing 
things (like rank to ts_rank) should be all that's required.



Oleg Bartunov has an incomplete list of items that have been renamed:

http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes

- Upgrade from prior versions -

#!/usr/bin/perl

# pg_restore must be in the path
#
# we use pg_restore to get the toc (requires the backup be made with pg_dump 
-Fc) from a backup
# create a newtoc with all the things that look like tsearch2 stuff stripped out
# so you can pg_restore -Fc -L newtoc ...

# syntax:  xtract dumpfile.bak  newtoc

use strict;

my %list = ('gtsq' = 1,
'gtsq_in' = 1,
'gtsq_out' = 1,
'gtsvector' = 1,
'gtsvector_in' = 1,
'gtsvector_out' = 1,
'tsquery' = 1,
'tsquery_in' = 1,
'tsquery_out' = 1,
'tsvector' = 1,
'tsvector_in' = 1,
'tsvector_out' = 1,
'statinfo' = 1,
'tokenout' = 1,
'tokentype' = 1,
'tsdebug' = 1,

Re: [HACKERS] 8.3 full text search docs

2007-10-13 Thread Albert Cervera i Areny
Andy,
note that documentation is discussed in the pgsql-docs list and patches 
usually are submitted to the pgsql-patches list. Nice to see both new 
sections, by the way.

A Diumenge 14 Octubre 2007, andy va escriure:
 I have two doc updates I'd like to offer.  I see we have two example
 sections: creating rule-based dict's and creating parsers.  When I was
 starting I would have liked to see an example usage.

 I'd like to offer: example usage and Upgrading.
 This is my first draft, if anyone has suggestions I'd be interested in
 hearing them.  Also, I'm not sure where or who to send this to, so I
 hope -hacker is ok.

 - Example Usage -
 Staring a new project with Full Text Searching is easy.  There is
 nothing to install anymore, its all built in (in fact, don't install the
 contrib module tsearch2 because it will conflict with the tsearch2 built
 into the core).

 We need to add a new column of type tsvector to the table you'd like to
 search.  In this example we'll use a table called notes.  If your table
 exists use:

 alter table notes add searchvec tsvector;

 If not use:

 create table notes (
   rowid integer,
   note text,
   searchvec tsvector
 );

 The searchvec column is what we will use for searching, so you probably
 want to create an index on it... from another place in the manual:

 (http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html)

 GiST indexes are very good for dynamic data and fast if the number of
 unique words (lexemes) is under 100,000,
 GIN indexes are best for static data with +100,000 lexemes.

 For our example, I'll create a gist index:

 create index notesvec on notes using gist(searchvec);

 If you have existing data, we need to fill the searchvec column:

 update notes set searchvec = to_tsvector(note);

 After the update, any rows are inserted or updated will not have their
 searchvec column set automatically, for that we need to create a trigger:

 create trigger notevecupdate
 before insert or update on notes
 for each row
 execute procedure
tsvector_update_trigger(searchvec, 'pg_catalog.english', note);


 Some data:
 insert into notes(rowid, note) values(1, 'this is a test');

 insert into notes(rowid, note)
 values(2, 'I do not like green eggs and ham');

 insert into notes(rowid, note) values(3, 'the cat in the hat');

 insert into notes(rowid, note)
 values(4, 'rage against the dying of the light');

 And now we can query it:

 select * from notes where searchvec @@ to_tsquery('light');

 or

 select * from notes, to_tsquery('test') as q where searchvec @@ q;

 writing it this way lets you reuse the tsquery q like this:

 select note, ts_rank(searchvec, q)
 from notes, to_tsquery('test') as q
 where searchvec @@ q
 order by ts_rank(searchvec, q);
 - Example Usage -



 - Upgrade from prior versions -

 When tsearch2 was put into core, some functions and types were renamed,
 among other things.  A simple backup and restore will not work to
 migrate your database from versions below 8.3 to 8.3.

 In general, the way to do it is backup all your data without the
 tsearch2 stuff, restore just the data, then recreate the tsearch2 stuff
 by hand.

 (Its easier to think of this as an upgrade from tsearch2 to tsearch3,
 but without the whole renaming it to tsearch3 thing)

 To make it a little easier, there is a way using the pg_restore to
 selective restore everything that is not in the old tsearch2.

 First you must use pg_dump -Fc to backup your existing database.

 Then we will create an item list of things we want to restore using this
 perl script.  It will strip out all the things that look like tsearch2,
 and return (to stdout) a list of things you should restore.

 For example:

 pg_dump -Fc -h oldserver -f ubberbase.bak ubberbase
 perl maketoc.pl ubberbase.bak  toc
 # now restore just the items in toc
 pg_restore -Fc --create -d postgres --use-list toc -f ubberbse.bak

 There is one thing that will fail, that's the trigger you used to update
 the tsvector column.  Its because the function tsearch2 was renamed to
 tsvector_update_trigger.  You'll need to recreate the trigger by hand.

 Now that the structures and data are restored, you'll need to go through
 and redo the tsearch2 stuff by hand.

 After you get the database fixed up, you'll also need to update your
 client programs (php, perl, etc).  For the most part, just renameing
 things (like rank to ts_rank) should be all that's required.


 Oleg Bartunov has an incomplete list of items that have been renamed:

 http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes

 - Upgrade from prior versions -



-- 
Albert Cervera i Areny
http://www.NaN-tic.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match