[ADMIN] Using text search for locations, or a computed btree index

Jaume Sabater Thu, 04 Dec 2008 09:21:57 -0800

Hello everyone!

The guys creating the web site at the company I work for want an incremental 
search for locations. So, I need to find all locations that start with Uni, 
then Unit, then Unite, and so on. Problem is that the tables, although 
partitioned, have got several milion of tuples. Here is what I have in mind:


1. There are no stop words ("Palma of Majorca" is "Palma of Majorca", not 
"Palma Majorca", although this is not the best example).
2. There are no synonyms that have to be matched to other words.
3. No phrases have to be mapped to single words.
4. No variations of words need to be mapped to a canonical form (although this 
may be false when we add alternate names of locations)
5. I need to take out all accented characters and change all words to its 
lowercased form for faster searching and easier locating.

So, I've been reading all the text search documentation available online and I 
think that I should be:

1. Creating an extra column with the "normalized" version of the location name.
2. Creating a GIN index on that extra column.
3. Creating a synonyms (variations) dictionary with the alternate names 
(touristic names that have no political equivalent but people know and use to 
search)

What I don't know how to do is to just tell tsvector (and tsquery when 
searching) to do what I need and not what I don't need. I can't find a way in 
the online documentation to tell the engine which pieces I want from the whole 
puzzle.

Alternatively, provided we don't use alternate names, I could use a computed 
btree index that does those needed operations: lowercase, substitute accented 
characters by their equivalents with a regular expression, and store that into 
an extra colum, perhaps?

Any hints would be welcome. Thanks in advance.

--
Jaume Sabater
http://linuxsilo.net/

"Ubi sapientas ibi libertas"

-- 
Sent via pgsql-admin mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

[ADMIN] Using text search for locations, or a computed btree index

Reply via email to