Re: [HACKERS] 8.3 full text search docs

2007-10-15 Thread Oleg Bartunov

On Mon, 15 Oct 2007, Tom Lane wrote:


andy <[EMAIL PROTECTED]> writes:

I'd like to offer: "example usage" and "Upgrading".


I did some hacking on your perl script to make it a bit more
bulletproof; I was worried about removing any function named 'concat'
for instance.  Attached is what it looks like now.

I'm not entirely sure what to do with the script --- perhaps we should
put it into contrib/tsearch2 and have it get installed from there?


yes, contrib/tsearch2 looks ok. As a bonus we could back-port it to the
previous releases and mention in release notes.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 8.3 full text search docs

2007-10-15 Thread Tom Lane
andy <[EMAIL PROTECTED]> writes:
> I'd like to offer: "example usage" and "Upgrading".

I did some hacking on your perl script to make it a bit more
bulletproof; I was worried about removing any function named 'concat'
for instance.  Attached is what it looks like now.

I'm not entirely sure what to do with the script --- perhaps we should
put it into contrib/tsearch2 and have it get installed from there?

regards, tom lane

#! /usr/bin/perl

# syntax:  removets2 dumpfile.dump [schema] > newtoc

# The dumpfile must have been made with pg_dump -Fc or -Ft
# If the tsearch2 objects were placed in some schema other than public,
# specify that schema's name as the second argument.
#
# pg_restore must be in your PATH
#
# We use pg_restore to get the dumpfile's table of contents (--list) and
# emit a new table of contents with all the tsearch2 objects stripped out.
# You can then do
#   pg_restore -L newtoc dumpfile.dump ...
# to load your dump into Postgres 8.3.

use strict;

my $dumpfile = $ARGV[0];
my $schema = $ARGV[1] || 'public';

my %list = (
'FUNCTION gtsq_in(cstring)' => 1,
'FUNCTION gtsq_out(gtsq)' => 1,
'TYPE gtsq' => 1,
'FUNCTION gtsvector_in(cstring)' => 1,
'FUNCTION gtsvector_out(gtsvector)' => 1,
'TYPE gtsvector' => 1,
'FUNCTION tsquery_in(cstring)' => 1,
'FUNCTION tsquery_out(tsquery)' => 1,
'TYPE tsquery' => 1,
'FUNCTION tsvector_in(cstring)' => 1,
'FUNCTION tsvector_out(tsvector)' => 1,
'TYPE tsvector' => 1,
'TYPE statinfo' => 1,
'TYPE tokenout' => 1,
'TYPE tokentype' => 1,
'TYPE tsdebug' => 1,
'FUNCTION _get_parser_from_curcfg()' => 1,
'FUNCTION concat(tsvector, tsvector)' => 1,
'FUNCTION dex_init(internal)' => 1,
'FUNCTION dex_lexize(internal, internal, integer)' => 1,
'FUNCTION exectsq(tsvector, tsquery)' => 1,
'FUNCTION get_covers(tsvector, tsquery)' => 1,
'FUNCTION gin_extract_tsquery(tsquery, internal, internal)' => 1,
'FUNCTION gin_extract_tsvector(tsvector, internal)' => 1,
'FUNCTION gin_ts_consistent(internal, internal, tsquery)' => 1,
'FUNCTION gtsq_compress(internal)' => 1,
'FUNCTION gtsq_consistent(gtsq, internal, integer)' => 1,
'FUNCTION gtsq_decompress(internal)' => 1,
'FUNCTION gtsq_penalty(internal, internal, internal)' => 1,
'FUNCTION gtsq_picksplit(internal, internal)' => 1,
'FUNCTION gtsq_same(gtsq, gtsq, internal)' => 1,
'FUNCTION gtsq_union(bytea, internal)' => 1,
'FUNCTION gtsvector_compress(internal)' => 1,
'FUNCTION gtsvector_consistent(gtsvector, internal, integer)' => 1,
'FUNCTION gtsvector_decompress(internal)' => 1,
'FUNCTION gtsvector_penalty(internal, internal, internal)' => 1,
'FUNCTION gtsvector_picksplit(internal, internal)' => 1,
'FUNCTION gtsvector_same(gtsvector, gtsvector, internal)' => 1,
'FUNCTION gtsvector_union(internal, internal)' => 1,
'FUNCTION headline(oid, text, tsquery, text)' => 1,
'FUNCTION headline(oid, text, tsquery)' => 1,
'FUNCTION headline(text, text, tsquery, text)' => 1,
'FUNCTION headline(text, text, tsquery)' => 1,
'FUNCTION headline(text, tsquery, text)' => 1,
'FUNCTION headline(text, tsquery)' => 1,
'FUNCTION length(tsvector)' => 1,
'FUNCTION lexize(oid, text)' => 1,
'FUNCTION lexize(text, text)' => 1,
'FUNCTION lexize(text)' => 1,
'FUNCTION numnode(tsquery)' => 1,
'FUNCTION parse(oid, text)' => 1,
'FUNCTION parse(text, text)' => 1,
'FUNCTION parse(text)' => 1,
'FUNCTION plainto_tsquery(oid, text)' => 1,
'FUNCTION plainto_tsquery(text, text)' => 1,
'FUNCTION plainto_tsquery(text)' => 1,
'FUNCTION prsd_end(internal)' => 1,
'FUNCTION prsd_getlexeme(internal, internal, internal)' => 1,
'FUNCTION prsd_headline(internal, internal, internal)' => 1,
'FUNCTION prsd_lextype(internal)' => 1,
'FUNCTION prsd_start(internal, integer)' => 1,
'FUNCTION querytree(tsquery)' => 1,
'FUNCTION rank(real[], tsvector, tsquery)' => 1,
'FUNCTION rank(real[], tsvector, tsquery, integer)' => 1,
'FUNCTION rank(tsvector, tsquery)' => 1,
'FUNCTION rank(tsvector, tsquery, integer)' => 1,
'FUNCTION rank_cd(real[], tsvector, tsquery)' => 1,
'FUNCTION rank_cd(real[], tsvector, tsquery, integer)' => 1,
'FUNCTION rank_cd(tsvector, tsquery)' => 1,
'FUNCTION rank_cd(tsvector, tsquery, integer)' => 1,
'FUNCTION reset_tsearch()' => 1,
'FUNCTION rewrite(tsquery, text)' => 1,
'FUNCTION rewrite(tsquery, tsquery, tsquery)' => 1,
'FUNCTION rewrite_accum(tsquery, tsquery[])' => 1,
'FUNCTION rewrite_finish(tsquery)' => 1,
'FUNCTION rexectsq(tsquery, tsvector)' => 1,
'FUNCTION set_curcfg(integer)' => 1,
'FUNCTION set_curcfg(text)' => 1,
'FUNCTION set_curdict(integer)' => 1,
'FUNCTION set_curdict(text)' => 1,
'FUNCTION set_curprs(integer)' => 1,
'FUNCTION set_curprs(text)' => 1,
'FUNCTION setweight(tsvector, "char")' => 1,
'FUNCTION show_curcfg()' => 1,
'FUNCTION snb_en_init(internal)' => 1,
'FUNCTION snb_lexize(internal, internal, integer)' => 1,
'FUNCTION snb_ru_init_koi8(internal)' => 1,
'FUNCTION snb_ru_init_utf8(internal)' => 1,
'FUNCTION spell_init(internal)' => 1,
'FUNCTION spell_lexize(internal, internal, integer)' => 1,
'FUNCTION stat(text)' => 1,

Re: [HACKERS] 8.3 full text search docs

2007-10-13 Thread Albert Cervera i Areny
Andy,
note that documentation is discussed in the pgsql-docs list and patches 
usually are submitted to the pgsql-patches list. Nice to see both new 
sections, by the way.

A Diumenge 14 Octubre 2007, andy va escriure:
> I have two doc updates I'd like to offer.  I see we have two example
> sections: creating rule-based dict's and creating parsers.  When I was
> starting I would have liked to see an example usage.
>
> I'd like to offer: "example usage" and "Upgrading".
> This is my first draft, if anyone has suggestions I'd be interested in
> hearing them.  Also, I'm not sure where or who to send this to, so I
> hope -hacker is ok.
>
> - Example Usage -
> Staring a new project with Full Text Searching is easy.  There is
> nothing to install anymore, its all built in (in fact, don't install the
> contrib module tsearch2 because it will conflict with the tsearch2 built
> into the core).
>
> We need to add a new column of type tsvector to the table you'd like to
> search.  In this example we'll use a table called notes.  If your table
> exists use:
>
> alter table notes add searchvec tsvector;
>
> If not use:
>
> create table notes (
>   rowid integer,
>   note text,
>   searchvec tsvector
> );
>
> The searchvec column is what we will use for searching, so you probably
> want to create an index on it... from another place in the manual:
>
> (http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html)
>
> GiST indexes are very good for dynamic data and fast if the number of
> unique words (lexemes) is under 100,000,
> GIN indexes are best for static data with +100,000 lexemes.
>
> For our example, I'll create a gist index:
>
> create index notesvec on notes using gist(searchvec);
>
> If you have existing data, we need to fill the searchvec column:
>
> update notes set searchvec = to_tsvector(note);
>
> After the update, any rows are inserted or updated will not have their
> searchvec column set automatically, for that we need to create a trigger:
>
> create trigger notevecupdate
> before insert or update on notes
> for each row
> execute procedure
>tsvector_update_trigger(searchvec, 'pg_catalog.english', note);
>
>
> Some data:
> insert into notes(rowid, note) values(1, 'this is a test');
>
> insert into notes(rowid, note)
> values(2, 'I do not like green eggs and ham');
>
> insert into notes(rowid, note) values(3, 'the cat in the hat');
>
> insert into notes(rowid, note)
> values(4, 'rage against the dying of the light');
>
> And now we can query it:
>
> select * from notes where searchvec @@ to_tsquery('light');
>
> or
>
> select * from notes, to_tsquery('test') as q where searchvec @@ q;
>
> writing it this way lets you reuse the tsquery "q" like this:
>
> select note, ts_rank(searchvec, q)
> from notes, to_tsquery('test') as q
> where searchvec @@ q
> order by ts_rank(searchvec, q);
> - Example Usage -
>
>
>
> - Upgrade from prior versions -
>
> When tsearch2 was put into core, some functions and types were renamed,
> among other things.  A simple backup and restore will not work to
> migrate your database from versions below 8.3 to 8.3.
>
> In general, the way to do it is backup all your data without the
> tsearch2 stuff, restore just the data, then recreate the tsearch2 stuff
> by hand.
>
> (Its easier to think of this as an upgrade from tsearch2 to tsearch3,
> but without the whole renaming it to tsearch3 thing)
>
> To make it a little easier, there is a way using the pg_restore to
> selective restore everything that is not in the old tsearch2.
>
> First you must use "pg_dump -Fc" to backup your existing database.
>
> Then we will create an item list of things we want to restore using this
> perl script.  It will strip out all the things that look like tsearch2,
> and return (to stdout) a list of things you should restore.
>
> For example:
>
> pg_dump -Fc -h oldserver -f ubberbase.bak ubberbase
> perl maketoc.pl ubberbase.bak > toc
> # now restore just the items in toc
> pg_restore -Fc --create -d postgres --use-list toc -f ubberbse.bak
>
> There is one thing that will fail, that's the trigger you used to update
> the tsvector column.  Its because the function tsearch2 was renamed to
> tsvector_update_trigger.  You'll need to recreate the trigger by hand.
>
> Now that the structures and data are restored, you'll need to go through
> and redo the tsearch2 stuff by hand.
>
> After you get the database fixed up, you'll also need to update your
> client programs (php, perl, etc).  For the most part, just renameing
> things (like rank to ts_rank) should be all that's required.
>
>
> Oleg Bartunov has an incomplete list of items that have been renamed:
>
> http://www.sai.msu.su/~megera/wiki/Tsearch2_83_changes
>
> - Upgrade from prior versions -



-- 
Albert Cervera i Areny
http://www.NaN-tic.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an