Tibor Simko <[email protected]> wrote:
> 
> On this accent stripping topic, I have an almost finished branch that
> should take care of ASCII'fication of Czech and many other languages
> properly out of the box.  The only exceptions may be the CJK family of
> languages and Greek, for which opinions differ:
>
>    https://github.com/inveniosoftware/invenio/issues/1675
>
> Here is an example:
>
>  In [1]: x = "Všichni lidé se rodí svobodní a sobě rovní " \
>              "co do důstojnosti a práv."
>    
>  In [2]: from invenio.textutils import strip_accents
>    
>  In [3]: strip_accents(x)
>  'Vsichni lide se rodi svobodni a sobe rovni co do dustojnosti a prav.'

May I know how does it handle Catalan middle dot character, used between
two l (that is, «l·l»), for words like «paral·lel» or «cal·ligrafia»?
Traditionaly, it should be ignored for searching (that is, one is
supposed to search «parallel» or «calligrafia», but «paral·lel» or
«cal·ligrafia» should give proper results too.
(see http://fr.wikipedia.org/wiki/L%C2%B7L).

Unfortunately, there is a typographical incorrect variant, I suspect
produced by Windows, with a stronger character (ex. «col•lectiu»), but
more likely it should be corrected localy.

Thanks,

Ferran

Reply via email to