On Fri, 2013-05-17, Olive wrote:

> One feature that seems to be missing in the re module (or any tools
> that I know for searching text) is "diacretical incensitive search". I
> would like to have a match for something like this:

> re.match("franc", "français")
...

> The algorithm to write such a function is trivial but there are a
> lot of mark we can put on a letter. It would be necessary to have the
> list of "a"'s with something on it. i.e. "à,á,ã", etc. and this for
> every letter. Trying to make such a list by hand would inevitably lead
> to some symbols forgotten (and would be tedious). 

Ok, but please remember that the diacriticals are of varying importance.
The english "naïve" is easily recognizable when written as "naive".
The swedish word "får" cannot be spelled "far" and still be understood.

This is IMHO out of the scope of re, and perhaps case-insensitivity
should have been too.  Perhaps it /would/ have been, if regular
expressions hadn't come from the ASCII world where these things are
easy.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to