But I just can't make it work correctly using brackets:
SELECT field FROM table WHERE field ~* 'ch[aã]o';

It just returns tuples that have 'chao', but not 'chão'.

My queries are utf-8 an the database is SQL_ASCII.

I suspect the bracketed expression is turning into [aXY], where XY is the two-byte sequence corresponding to ã in UTF8. So the regular expression is only going to match strings of the form chao, chXo and chYo. To make sure that this is what's happening, try this:

  select length('ã');

I bet you get back 2, not 1. I don't know if a UTF8 database will handle this correctly or not. The safest thing to do may be to use queries like this:

  SELECT field FROM table WHERE field ~* 'ch(a|ã)o';

- John D. Burger
  MITRE

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to