Tibor Simko <[email protected]> wrote: > > On this accent stripping topic, I have an almost finished branch that > should take care of ASCII'fication of Czech and many other languages > properly out of the box. The only exceptions may be the CJK family of > languages and Greek, for which opinions differ: > > https://github.com/inveniosoftware/invenio/issues/1675 > > Here is an example: > > In [1]: x = "Všichni lidé se rodí svobodní a sobě rovní " \ > "co do důstojnosti a práv." > > In [2]: from invenio.textutils import strip_accents > > In [3]: strip_accents(x) > 'Vsichni lide se rodi svobodni a sobe rovni co do dustojnosti a prav.'
May I know how does it handle Catalan middle dot character, used between two l (that is, «l·l»), for words like «paral·lel» or «cal·ligrafia»? Traditionaly, it should be ignored for searching (that is, one is supposed to search «parallel» or «calligrafia», but «paral·lel» or «cal·ligrafia» should give proper results too. (see http://fr.wikipedia.org/wiki/L%C2%B7L). Unfortunately, there is a typographical incorrect variant, I suspect produced by Windows, with a stronger character (ex. «col•lectiu»), but more likely it should be corrected localy. Thanks, Ferran

