Re: [GENERAL] Full text index without accents

Jonathan Bond-Caron Fri, 25 Jul 2008 05:51:02 -0700

This would probably help: 

CREATE OR REPLACE FUNCTION norm_text_latin(character varying)
  RETURNS character varying AS
$BODY$
declare
        p_str           alias for $1;
        v_str           varchar;
begin
        select translate(p_str, 'ÀÁÂÃÄÅ', 'AAAAAA') into v_str;
        select translate(v_str, 'ÉÈËÊ', 'EEEE') into v_str;
        select translate(v_str, 'ÌÍÎÏ', 'IIII') into v_str;
        select translate(v_str, 'ÌÍÎÏ', 'IIII') into v_str;
        select translate(v_str, 'ÒÓÔÕÖ', 'OOOOO') into v_str;
        select translate(v_str, 'ÙÚÛÜ', 'UUUU') into v_str;
        select translate(v_str, 'àáâãäå', 'aaaaaa') into v_str;
        select translate(v_str, 'èéêë', 'eeee') into v_str;
        select translate(v_str, 'ìíîï', 'iiii') into v_str;
        select translate(v_str, 'òóôõö', 'ooooo') into v_str;
        select translate(v_str, 'ùúûü', 'uuuu') into v_str;
        select translate(v_str, 'Çç', 'Cc') into v_str;
        return v_str;
end;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;


There's also o useful functions here:
http://www.project-open.org/doc/intranet-search-pg/intranet-search-pg-create.sql




-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fco. Mario 
Barcala Rodríguez
Sent: July 24, 2008 4:47 AM
To: [email protected]
Subject: Re: [GENERAL] Full text index without accents

Finally I create a function like:

CREATE OR REPLACE FUNCTION nonsensible (text) RETURNS text AS $$
DECLARE
  var1 varchar;
BEGIN
  var1=replace($1, 'á', 'a');
  var1=replace(var1, 'é', 'e');
  var1=replace(var1, 'í', 'i');
  var1=replace(var1, 'ó', 'o');
  var1=replace(var1, 'ú', 'u');
  var1=replace(var1, 'Á', 'A');
  var1=replace(var1, 'É', 'E');
  var1=replace(var1, 'Í', 'I');
  var1=replace(var1, 'Ó', 'O');
  var1=replace(var1, 'Ú', 'U');
  return var1;
END
$$LANGUAGE plpgsql immutable;

Then, create text indexes, one for sensible queries and other for
unsensible ones:

CREATE INDEX textindex ON document USING
gin(to_tsvector('spanish',text));

CREATE INDEX textindexn ON document USING
gin(to_tsvector('spanish',nonsensible(text)));

And then make a query sensible or unsensible to accents doing:

SELECT id FROM document WHERE to_tsvector('spanish',text) @@
to_tsquery('spanish','word_with_accent');

or:

SELECT id FROM document WHERE to_tsvector('spanish',nonsensible(text))
@@ to_tsquery('spanish',nonsensible('word_with_accent'));
respectively.

I think postgreSQL uses both indexes as necessary. I believe to remember
reading something about it in the documentation.

Thank you very much,

  Mario Barcala 


> Here is an example
> 
> CREATE FUNCTION dropatsymbol(text) RETURNS text
> AS 'select replace($1, ''@'', '' '');'
> LANGUAGE SQL;
> 
> arxiv=# select to_tsvector('english',dropatsymbol('[EMAIL PROTECTED]'));
>         to_tsvector
> -------------------------
>   'oleg':1 'sai.msu.su':2



-- 
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text index without accents

Reply via email to