Re: [GENERAL] Accent insensitive search?

2009-03-30 Thread Stuart Bishop
On Tue, Mar 24, 2009 at 4:53 PM, Jasen Betts ja...@xnet.co.nz wrote:
 On 2009-03-18, cifroes cifr...@netcabo.pt wrote:
 This is a multi-part message in MIME format.

 --_=_NextPart_001_01C9A7E6.B32BBA87
 Content-Type: text/plain;
       charset=iso-8859-1
 Content-Transfer-Encoding: quoted-printable

 Hi,

 I have a DB in utf-8 and postgres 8.3.x.=20

 How can I do an accent insensitive search (like ...) ?

 use a posix regular expression that matches the string you want:

 select * from foo where colname ~ '^[CcĆćĈĉĊċČč][ĀāĂ㥹Aa][Pp][ŌōŎŏŐő]$';

 you could write a function to do the translation.



Unicode normalization is an interesting way to strip accents I've found:

 import unicodedata

 s = u'Björn'
 s = unicodedata.normalize(NFKD, s)
 s = ''.join(c for c in s if ord(c)  127)
 print s
Bjorn

You can also use the character names to map many more characters to
the ascii equivalent. A large number of these can me smashed into
ASCII using regular expressions and some manual mappings to map LETTER
THORN - th, LETTER LATERAL CLICK - X etc. Just mapping CAPITAL
LETTER XX - XX and SMALL LETTER XX - xx seems to get you most of
europe if you special case SHARP S - ss and THORN - th.

 s = u'ァ'
 print unicodedata.name(s)
KATAKANA LETTER SMALL A


-- 
Stuart Bishop stu...@stuartbishop.net
http://www.stuartbishop.net/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-24 Thread Jasen Betts
On 2009-03-18, cifroes cifr...@netcabo.pt wrote:
 This is a multi-part message in MIME format.

 --_=_NextPart_001_01C9A7E6.B32BBA87
 Content-Type: text/plain;
   charset=iso-8859-1
 Content-Transfer-Encoding: quoted-printable

 Hi,

 I have a DB in utf-8 and postgres 8.3.x.=20

 How can I do an accent insensitive search (like ...) ?

use a posix regular expression that matches the string you want:

select * from foo where colname ~ '^[CcĆćĈĉĊċČč][ĀāĂ㥹Aa][Pp][ŌōŎŏŐő]$';

you could write a function to do the translation.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Accent insensitive search?

2009-03-18 Thread cifroes
Hi,

I have a DB in utf-8 and postgres 8.3.x. 

How can I do an accent insensitive search (like ...) ?


TIA


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Pedro Doria Meunier
Hi,

use ILIKE

HTH,

Pedro Doria Meunier
GSM: +351961720188
Skype: pdoriam

On Wednesday 18 March 2009 04:29:24 pm cifroes wrote:
 Hi,

 I have a DB in utf-8 and postgres 8.3.x.

 How can I do an accent insensitive search (like ...) ?


 TIA




signature.asc
Description: This is a digitally signed message part.


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread cifroes
Not case insensitive but accent insensitive :)

And I tried select to_ascii('capo','LATIN1'), to_ascii('çapo','LATIN1') and the 
results are different


-Original Message-
From: pgsql-general-ow...@postgresql.org on behalf of Pedro Doria Meunier
Sent: Wed 3/18/2009 4:46 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Accent insensitive search?
 
Hi,

use ILIKE

HTH,

Pedro Doria Meunier
GSM: +351961720188
Skype: pdoriam

On Wednesday 18 March 2009 04:29:24 pm cifroes wrote:
 Hi,

 I have a DB in utf-8 and postgres 8.3.x.

 How can I do an accent insensitive search (like ...) ?


 TIA






Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Thom Brown
2009/3/18 Pedro Doria Meunier pdo...@netmadeira.com

 Hi,

 use ILIKE

 HTH,


ILIKE is only case-insensitive, and won't match accented characters.  The
only thing I can think of doing is to create a function which will replace
characters with their equivalent non-accented counterparts and use that in
the query.

Example: SELECT name FROM people WHERE unaccent_string(name) ~* 'Pédro';

The function would perform a reg-ex replace similar to: s/[èééêë]/e

Just an idea

Thom


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Thom Brown
Here's an example of a function I might use (although I haven't actually got
plperl installed, so can't test it myself, but you'll get the idea:

CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$
my ($input_string) = @_;
$input_string =~ s/[âãäåāăą]/a;
$input_string =~ s/[ÁÂÃÄÅĀĂĄ]/A;
$input_string =~ s/[èééêëēĕėęě]/e;
$input_string =~ s/[ĒĔĖĘĚ]/E;
$input_string =~ s/[ìíîïìĩīĭ]/i;
$input_string =~ s/[ÌÍÎÏÌĨĪĬ]/I;
$input_string =~ s/[óôõöōŏő]/o;
$input_string =~ s/[ÒÓÔÕÖŌŎŐ]/O;
$input_string =~ s/[ùúûüũūŭů]/u;
$input_string =~ s/[ÙÚÛÜŨŪŬŮ]/U;
return $input_string;
$$ LANGUAGE plperl;


Regards

Thom


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Sam Mason
On Wed, Mar 18, 2009 at 04:29:24PM -, cifroes wrote:
 I have a DB in utf-8 and postgres 8.3.x. 
 
 How can I do an accent insensitive search (like ...) ?

No good idea at the moment; I'd somehow expect to find this sort of
normalization in the functionality provided by the text search code.
My reasoning being that searching for a word should find it if you're
being accurate or lazy with your accents.  I've not been able to coax it
into doing anything reliable though.

If you can't find anything better in PG; the translate[1] function would
be my best suggestion.  Performance should be better than using regular
expressions.

-- 
  Sam  http://samason.me.uk/
 
 [1] http://www.postgresql.org/docs/current/static/functions-string.html

-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Pedro Doria Meunier
Ooops!
Silly me! I should have read more carefully ... (blush) sorry!

Pedro Doria Meunier
GSM: +351961720188
Skype: pdoriam

On Wednesday 18 March 2009 04:46:16 pm Pedro Doria Meunier wrote:
 Hi,

 use ILIKE

 HTH,

 Pedro Doria Meunier
 GSM: +351961720188
 Skype: pdoriam

 On Wednesday 18 March 2009 04:29:24 pm cifroes wrote:
  Hi,
 
  I have a DB in utf-8 and postgres 8.3.x.
 
  How can I do an accent insensitive search (like ...) ?
 
 
  TIA




signature.asc
Description: This is a digitally signed message part.


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Thom Brown
2009/3/18 Sam Mason s...@samason.me.uk


 If you can't find anything better in PG; the translate[1] function would
 be my best suggestion.  Performance should be better than using regular
 expressions.


Yeah, that does appear to perform better.  I tried the following at it
worked for me:

CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$
DECLARE
input_string text := $1;
BEGIN

input_string := translate(input_string, 'âãäåāăąÁÂÃÄÅĀĂĄ',
'aaa');
input_string := translate(input_string, 'èééêëēĕėęěĒĔĖĘĚ',
'eee');
input_string := translate(input_string, 'ìíîïìĩīĭÌÍÎÏÌĨĪĬ',
'');
input_string := translate(input_string, 'óôõöōŏőÒÓÔÕÖŌŎŐ',
'ooo');
input_string := translate(input_string, 'ùúûüũūŭůÙÚÛÜŨŪŬŮ',
'');

return input_string;
END;
$$ LANGUAGE plpgsql;


Regards

Thom


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Thomas Kellerer

Sam Mason wrote on 18.03.2009 18:15:

On Wed, Mar 18, 2009 at 04:29:24PM -, cifroes wrote:
I have a DB in utf-8 and postgres 8.3.x. 


How can I do an accent insensitive search (like ...) ?


No good idea at the moment; I'd somehow expect to find this sort of
normalization in the functionality provided by the text search code.
My reasoning being that searching for a word should find it if you're
being accurate or lazy with your accents.  I've not been able to coax it
into doing anything reliable though.

If you can't find anything better in PG; the translate[1] function would
be my best suggestion.  Performance should be better than using regular
expressions.


Are you aware of any plans to add accent-insesitive collations to PG?

Would be great if PG offered a similar flexibility here as other databases.

Thomas


-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread John R Pierce



how about...

select    where translate(lower(myfield), 
'âãäåāăąèééêëēĕėęěìíîïìĩīĭóôõöōŏőùúûüũūŭů', 
'aaaeeooouuu') = 'stringiwannamatch';



or something like that.   I may have miscounted the vowells in the 'to' 
string :)




-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Christophe
What I've done in the past in this situation is to create a separate  
field with the text normalized to whatever the search form is (all  
lower case, accents stripped, etc.), and then index and search that  
from the application.


Although I've not tried it, a functional index that did the same thing  
might work, as long as you were careful in how the queries were  
written to ensure that it was used.


-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Oleg Bartunov

On Wed, 18 Mar 2009, cifroes wrote:


Hi,

I have a DB in utf-8 and postgres 8.3.x.

How can I do an accent insensitive search (like ...) ?


Take a look on text search capability and 
http://www.sai.msu.su/~megera/wiki/unaccent

We have patches for CVS HEAD, but unfortunately they will likely
go to the 8.5 release

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Alvaro Herrera
Thom Brown escribió:
 Here's an example of a function I might use (although I haven't actually got
 plperl installed, so can't test it myself, but you'll get the idea:
 
 CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$
 my ($input_string) = @_;
 $input_string =~ s/[âãäåāăą]/a;
 $input_string =~ s/[ÁÂÃÄÅĀĂĄ]/A;

Hmm, if to_ascii() doesn't work, that's something worth some research.
Maybe the encoding config is broken, for example.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent insensitive search?

2009-03-18 Thread Christophe


On Mar 18, 2009, at 11:24 AM, Alvaro Herrera wrote:

Hmm, if to_ascii() doesn't work, that's something worth some research.
Maybe the encoding config is broken, for example.


The docs say to_ascii() only works with LATIN1, LATIN2, LATIN9, and  
WIN1250; maybe convert('string', 'UTF-8', 'SQL_ASCII')?


-
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Accent-insensitive search

2007-07-13 Thread Alvaro Herrera
turbovince escribió:
 Hello, I would like to perform some accent-insensitive searches on my
 database, which means that a select query with condition, say, WHERE
 NAME = 'HELLÔ' would return records where name is 'HELLO' as well.
 
 My data is encoded in Unicode (UTF8) and therefore I cannot use
 Postgre's to_ascii() trick to achieve accent-insensitive searches.

Use the convert() function to turn it into Latin1 (or whatever encoding
you prefer), then to_ascii.

Note that if you have strange chars it will fail anyway (for example
there was this guy not long ago complaining in pgsql-es-ayuda that it
failed when he had the mu greek letter in a product description).

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo (Platon).

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] Accent-insensitive search

2007-07-12 Thread turbovince
Hello, I would like to perform some accent-insensitive searches on my
database, which means that a select query with condition, say, WHERE
NAME = 'HELLÔ' would return records where name is 'HELLO' as well.

My data is encoded in Unicode (UTF8) and therefore I cannot use
Postgre's to_ascii() trick to achieve accent-insensitive searches.

Is there any way I could perform such searches with an UTF8 encoding ?

Thank you.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Accent-insensitive search

2007-07-12 Thread Jorge Godoy
On Monday 09 July 2007 18:33:49 turbovince wrote:
 Hello, I would like to perform some accent-insensitive searches on my
 database, which means that a select query with condition, say, WHERE
 NAME = 'HELLÔ' would return records where name is 'HELLO' as well.

 My data is encoded in Unicode (UTF8) and therefore I cannot use
 Postgre's to_ascii() trick to achieve accent-insensitive searches.

 Is there any way I could perform such searches with an UTF8 encoding ?

Check the translate() documentation.  It isn't the best thing in the world to 
have to use it and code things by yourself, but it works.

-- 
Jorge Godoy  [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Accent insensitive search

2007-06-22 Thread Alvaro Herrera
Diego Manilla Suárez wrote:
 Hi. I have a few databases created with UNICODE encoding, and I would 
 like to be able to search with accent insensitivity. There's something 
 in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do 
 this, but I found nothing in PostgreSQL, just the 'to_ascii' function, 
 which AFAIK, doesn't work with UNICODE.

to_ascii() doesn't work with UTF-8, but you can use convert() to turn
the UTF8 text into Latin-1 and then use to_ascii() to remove the funny
bits.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Accent insensitive search

2007-06-21 Thread Diego Manilla Suárez
Hi. I have a few databases created with UNICODE encoding, and I would 
like to be able to search with accent insensitivity. There's something 
in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do 
this, but I found nothing in PostgreSQL, just the 'to_ascii' function, 
which AFAIK, doesn't work with UNICODE.


Thanks in advance.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Accent insensitive search

2007-06-21 Thread PFC


Hi. I have a few databases created with UNICODE encoding, and I would  
like to be able to search with accent insensitivity. There's something  
in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do  
this, but I found nothing in PostgreSQL, just the 'to_ascii' function,  
which AFAIK, doesn't work with UNICODE.


	The easiest way is to create an extra column which will hold a copy of  
your text, with all accents removed. You can also convert it to lowercase  
and remove apostrophes, punctuation etc. Said column is kept up to date  
with a trigger.

Python is suitable for this (use unicodedata.normalize).
	Keeping a copy of the processed data will speed up search versus WHERE  
remove_accents( blah ) = 'text', even with a function index.
	Note that this function could be written in C and use a table on the  
first 64K unicode symbols for speedup.


See attached file.

create_ft_functions.sql
Description: Binary data

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Accent insensitive search

2007-06-21 Thread Gregory Stark

PFC [EMAIL PROTECTED] writes:

 Hi. I have a few databases created with UNICODE encoding, and I would like to
 be able to search with accent insensitivity. There's something  in Oracle
 (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do  this, but I found
 nothing in PostgreSQL, just the 'to_ascii' function,  which AFAIK, doesn't
 work with UNICODE.

Postgres supports localized collation orders but what it doesn't currently
support is having multiple collation orders within the same server.

So when you initialize the server with initdb it takes the setting of
LC_COLLATE (usually from LC_ALL) and stores that permanently. If you initdb
with LC_COLLATE set to a locale like en_GB.UTF-8 or something like that you
may find one that has the behaviour you want. I think they won't be entirely
accent insensitive but they'll consider accents only if the rest of the string
is identical.

You can test the sort order of a locale by writing a file with sample words
and sorting with something like:

LC_ALL=en_GB.UTF-8 sort /tmp/data

   The easiest way is to create an extra column which will hold a copy of
 your text, with all accents removed. You can also convert it to lowercase  and
 remove apostrophes, punctuation etc. Said column is kept up to date  with a
 trigger.

That's another alternative which is useful if you need multiple collations in
your database. This gives you control over which collation is used when and
exactly what the rules are. The downside is that you have to reinvent the
collation rules which the localized collations already provide.

You don't necessarily have to keep a column in your table with the normalized
strings. You can normalize on-the-fly using an expression index as long as
your function always returns the same data given the same inputs (and is
therefore marked immutable).

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Accent insensitive search

2007-06-21 Thread Albe Laurenz
PFC wrote:
 Hi. I have a few databases created with UNICODE encoding, and I would  
 like to be able to search with accent insensitivity. There's something  
 in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do  
 this, but I found nothing in PostgreSQL, just the 'to_ascii' function,  
 which AFAIK, doesn't work with UNICODE.
 
   The easiest way is to create an extra column which will hold a copy of  
 your text, with all accents removed. You can also convert it to lowercase  
 and remove apostrophes, punctuation etc. Said column is kept up to date  
 with a trigger.

Creating an extra column which holds a copy of the data is certainly not
a very good idea, as it will create unnecessary redundancy (breaking
normal forms), bloat your table, and the trigger will have a certain
performance impact.

My suggestion is to write a function that removes the accents in a string
for your language, let's call it noaccents(text).

Then you can index the column atext with

CREATE INDEX atable_atext_idx ON atable ((noaccents(atext)))

Then every query of the form

... WHERE noaccents(atext) = noaccents('SOMÉTHING')

can use the index.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] Accent insensitive search

2003-07-01 Thread Ian Barwick
On Tuesday 01 July 2003 17:11, Alejandro Javier Pomeraniec wrote:
 Hi ! Does anyone knows how to make accent insensitive searches??

 For example

 i have this data in a table

 
 Colón
 Polo
 

 I need that this query

 SELECT * FROM testtable WHERE testfield like '%olo%';

 brings both results instead of only showing Polo.

One solution[*]:

SELECT * FROM testtable WHERE to_ascii(testfield,'LATIN1') LIKE '%olo%'

Note this might not work with all database encodings, especially UNICODE.

[*] no doubt someone will be along in a moment with another.


Ian Barwick
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] Accent insensitive search

2003-07-01 Thread Alvaro Herrera
On Tue, Jul 01, 2003 at 03:11:54PM +, Alejandro Javier Pomeraniec wrote:

 Hi ! Does anyone knows how to make accent insensitive searches??

Convert both the pattern and the column to ASCII with to_ascii

 SELECT * FROM testtable WHERE testfield like '%olo%';

SELECT * FROM testtable WHERE to_ascii(testfield) like '%olo%';

Do you know the pgsql-ayuda mailing list?  You may like it.
http://tlali.iztacala.unam.mx/mailman/listinfo/pgsql-ayuda

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura (Perelandra, CSLewis)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html