Re: [HACKERS] 8.3 full text search docs
On Mon, 15 Oct 2007, Tom Lane wrote: andy <[EMAIL PROTECTED]> writes: I'd like to offer: "example usage" and "Upgrading". I did some hacking on your perl script to make it a bit more bulletproof; I was worried about removing any function named 'concat' for instance. Attached is what it looks like now. I'm not entirely sure what to do with the script --- perhaps we should put it into contrib/tsearch2 and have it get installed from there? yes, contrib/tsearch2 looks ok. As a bonus we could back-port it to the previous releases and mention in release notes. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 8.3 full text search docs
andy <[EMAIL PROTECTED]> writes: > I'd like to offer: "example usage" and "Upgrading". I did some hacking on your perl script to make it a bit more bulletproof; I was worried about removing any function named 'concat' for instance. Attached is what it looks like now. I'm not entirely sure what to do with the script --- perhaps we should put it into contrib/tsearch2 and have it get installed from there? regards, tom lane #! /usr/bin/perl # syntax: removets2 dumpfile.dump [schema] > newtoc # The dumpfile must have been made with pg_dump -Fc or -Ft # If the tsearch2 objects were placed in some schema other than public, # specify that schema's name as the second argument. # # pg_restore must be in your PATH # # We use pg_restore to get the dumpfile's table of contents (--list) and # emit a new table of contents with all the tsearch2 objects stripped out. # You can then do # pg_restore -L newtoc dumpfile.dump ... # to load your dump into Postgres 8.3. use strict; my $dumpfile = $ARGV[0]; my $schema = $ARGV[1] || 'public'; my %list = ( 'FUNCTION gtsq_in(cstring)' => 1, 'FUNCTION gtsq_out(gtsq)' => 1, 'TYPE gtsq' => 1, 'FUNCTION gtsvector_in(cstring)' => 1, 'FUNCTION gtsvector_out(gtsvector)' => 1, 'TYPE gtsvector' => 1, 'FUNCTION tsquery_in(cstring)' => 1, 'FUNCTION tsquery_out(tsquery)' => 1, 'TYPE tsquery' => 1, 'FUNCTION tsvector_in(cstring)' => 1, 'FUNCTION tsvector_out(tsvector)' => 1, 'TYPE tsvector' => 1, 'TYPE statinfo' => 1, 'TYPE tokenout' => 1, 'TYPE tokentype' => 1, 'TYPE tsdebug' => 1, 'FUNCTION _get_parser_from_curcfg()' => 1, 'FUNCTION concat(tsvector, tsvector)' => 1, 'FUNCTION dex_init(internal)' => 1, 'FUNCTION dex_lexize(internal, internal, integer)' => 1, 'FUNCTION exectsq(tsvector, tsquery)' => 1, 'FUNCTION get_covers(tsvector, tsquery)' => 1, 'FUNCTION gin_extract_tsquery(tsquery, internal, internal)' => 1, 'FUNCTION gin_extract_tsvector(tsvector, internal)' => 1, 'FUNCTION gin_ts_consistent(internal, internal, tsquery)' => 1, 'FUNCTION gtsq_compress(internal)' => 1, 'FUNCTION gtsq_consistent(gtsq, internal, integer)' => 1, 'FUNCTION gtsq_decompress(internal)' => 1, 'FUNCTION gtsq_penalty(internal, internal, internal)' => 1, 'FUNCTION gtsq_picksplit(internal, internal)' => 1, 'FUNCTION gtsq_same(gtsq, gtsq, internal)' => 1, 'FUNCTION gtsq_union(bytea, internal)' => 1, 'FUNCTION gtsvector_compress(internal)' => 1, 'FUNCTION gtsvector_consistent(gtsvector, internal, integer)' => 1, 'FUNCTION gtsvector_decompress(internal)' => 1, 'FUNCTION gtsvector_penalty(internal, internal, internal)' => 1, 'FUNCTION gtsvector_picksplit(internal, internal)' => 1, 'FUNCTION gtsvector_same(gtsvector, gtsvector, internal)' => 1, 'FUNCTION gtsvector_union(internal, internal)' => 1, 'FUNCTION headline(oid, text, tsquery, text)' => 1, 'FUNCTION headline(oid, text, tsquery)' => 1, 'FUNCTION headline(text, text, tsquery, text)' => 1, 'FUNCTION headline(text, text, tsquery)' => 1, 'FUNCTION headline(text, tsquery, text)' => 1, 'FUNCTION headline(text, tsquery)' => 1, 'FUNCTION length(tsvector)' => 1, 'FUNCTION lexize(oid, text)' => 1, 'FUNCTION lexize(text, text)' => 1, 'FUNCTION lexize(text)' => 1, 'FUNCTION numnode(tsquery)' => 1, 'FUNCTION parse(oid, text)' => 1, 'FUNCTION parse(text, text)' => 1, 'FUNCTION parse(text)' => 1, 'FUNCTION plainto_tsquery(oid, text)' => 1, 'FUNCTION plainto_tsquery(text, text)' => 1, 'FUNCTION plainto_tsquery(text)' => 1, 'FUNCTION prsd_end(internal)' => 1, 'FUNCTION prsd_getlexeme(internal, internal, internal)' => 1, 'FUNCTION prsd_headline(internal, internal, internal)' => 1, 'FUNCTION prsd_lextype(internal)' => 1, 'FUNCTION prsd_start(internal, integer)' => 1, 'FUNCTION querytree(tsquery)' => 1, 'FUNCTION rank(real[], tsvector, tsquery)' => 1, 'FUNCTION rank(real[], tsvector, tsquery, integer)' => 1, 'FUNCTION rank(tsvector, tsquery)' => 1, 'FUNCTION rank(tsvector, tsquery, integer)' => 1, 'FUNCTION rank_cd(real[], tsvector, tsquery)' => 1, 'FUNCTION rank_cd(real[], tsvector, tsquery, integer)' => 1, 'FUNCTION rank_cd(tsvector, tsquery)' => 1, 'FUNCTION rank_cd(tsvector, tsquery, integer)' => 1, 'FUNCTION reset_tsearch()' => 1, 'FUNCTION rewrite(tsquery, text)' => 1, 'FUNCTION rewrite(tsquery, tsquery, tsquery)' => 1, 'FUNCTION rewrite_accum(tsquery, tsquery[])' => 1, 'FUNCTION rewrite_finish(tsquery)' => 1, 'FUNCTION rexectsq(tsquery, tsvector)' => 1, 'FUNCTION set_curcfg(integer)' => 1, 'FUNCTION set_curcfg(text)' => 1, 'FUNCTION set_curdict(integer)' => 1, 'FUNCTION set_curdict(text)' => 1, 'FUNCTION set_curprs(integer)' => 1, 'FUNCTION set_curprs(text)' => 1, 'FUNCTION setweight(tsvector, "char")' => 1, 'FUNCTION show_curcfg()' => 1, 'FUNCTION snb_en_init(internal)' => 1, 'FUNCTION snb_lexize(internal, internal, integer)' => 1, 'FUNCTION snb_ru_init_koi8(internal)' => 1, 'FUNCTION snb_ru_init_utf8(internal)' => 1, 'FUNCTION spell_init(internal)' => 1, 'FUNCTION spell_lexize(internal, internal, integer)' => 1, 'FUNCTION stat(text)' => 1,
Re: [HACKERS] 8.3 full text search docs
Andy, note that documentation is discussed in the pgsql-docs list and patches usually are submitted to the pgsql-patches list. Nice to see both new sections, by the way. A Diumenge 14 Octubre 2007, andy va escriure: > I have two doc updates I'd like to offer. I see we have two example > sections: creating rule-based dict's and creating parsers. When I was > starting I would have liked to see an example usage. > > I'd like to offer: "example usage" and "Upgrading". > This is my first draft, if anyone has suggestions I'd be interested in > hearing them. Also, I'm not sure where or who to send this to, so I > hope -hacker is ok. > > - Example Usage - > Staring a new project with Full Text Searching is easy. There is > nothing to install anymore, its all built in (in fact, don't install the > contrib module tsearch2 because it will conflict with the tsearch2 built > into the core). > > We need to add a new column of type tsvector to the table you'd like to > search. In this example we'll use a table called notes. If your table > exists use: > > alter table notes add searchvec tsvector; > > If not use: > > create table notes ( > rowid integer, > note text, > searchvec tsvector > ); > > The searchvec column is what we will use for searching, so you probably > want to create an index on it... from another place in the manual: > > (http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html) > > GiST indexes are very good for dynamic data and fast if the number of > unique words (lexemes) is under 100,000, > GIN indexes are best for static data with +100,000 lexemes. > > For our example, I'll create a gist index: > > create index notesvec on notes using gist(searchvec); > > If you have existing data, we need to fill the searchvec column: > > update notes set searchvec = to_tsvector(note); > > After the update, any rows are inserted or updated will not have their > searchvec column set automatically, for that we need to create a trigger: > > create trigger notevecupdate > before insert or update on notes > for each row > execute procedure >tsvector_update_trigger(searchvec, 'pg_catalog.english', note); > > > Some data: > insert into notes(rowid, note) values(1, 'this is a test'); > > insert into notes(rowid, note) > values(2, 'I do not like green eggs and ham'); > > insert into notes(rowid, note) values(3, 'the cat in the hat'); > > insert into notes(rowid, note) > values(4, 'rage against the dying of the light'); > > And now we can query it: > > select * from notes where searchvec @@ to_tsquery('light'); > > or > > select * from notes, to_tsquery('test') as q where searchvec @@ q; > > writing it this way lets you reuse the tsquery "q" like this: > > select note, ts_rank(searchvec, q) > from notes, to_tsquery('test') as q > where searchvec @@ q > order by ts_rank(searchvec, q); > - Example Usage - > > > > - Upgrade from prior versions - > > When tsearch2 was put into core, some functions and types were renamed, > among other things. A simple backup and restore will not work to > migrate your database from versions below 8.3 to 8.3. > > In general, the way to do it is backup all your data without the > tsearch2 stuff, restore just the data, then recreate the tsearch2 stuff > by hand. > > (Its easier to think of this as an upgrade from tsearch2 to tsearch3, > but without the whole renaming it to tsearch3 thing) > > To make it a little easier, there is a way using the pg_restore to > selective restore everything that is not in the old tsearch2. > > First you must use "pg_dump -Fc" to backup your existing database. > > Then we will create an item list of things we want to restore using this > perl script. It will strip out all the things that look like tsearch2, > and return (to stdout) a list of things you should restore. > > For example: > > pg_dump -Fc -h oldserver -f ubberbase.bak ubberbase > perl maketoc.pl ubberbase.bak > toc > # now restore just the items in toc > pg_restore -Fc --create -d postgres --use-list toc -f ubberbse.bak > > There is one thing that will fail, that's the trigger you used to update > the tsvector column. Its because the function tsearch2 was renamed to > tsvector_update_trigger. You'll need to recreate the trigger by hand. > > Now that the structures and data are restored, you'll need to go through > and redo the tsearch2 stuff by hand. > > After you get the database fixed up, you'll also need to update your > client programs (php, perl, etc). For the most part, just renameing > things (like rank to ts_rank) should be all that's required. > > > Oleg Bartunov has an incomplete list of items that have been renamed: > > http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes > > - Upgrade from prior versions - -- Albert Cervera i Areny http://www.NaN-tic.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an