Hoi,
We do have languages that are not supported with CLDR locales. Does Unicode
on it own suffice ?
Thanks.
GerardM
2009/1/10 Greg Hewgill <[email protected]>
> 2009/1/11 Gerard Meijssen <[email protected]>:
> > How many characters are there according to your software in the word
> Mbɔ́tɛ
> > ? The correct answer is 5
>
> Since I was working with the enwiki dump, I did not pay much attention
> to internationalisation issues. I arbitrarily defined a "word" as the
> Python regular expression: [\w\d]+
>
> So, the answer to your question depends on how Python implements the
> \w word-matching regular expression atom:
>
> "When the LOCALE and UNICODE flags are not specified, matches any
> alphanumeric character and the underscore; this is equivalent to the
> set [a-zA-Z0-9_]. With LOCALE, it will match the set [0-9_] plus
> whatever characters are defined as alphanumeric for the current
> locale. If UNICODE is set, this will match the characters [0-9_] plus
> whatever is classified as alphanumeric in the Unicode character
> properties database. "
>
> Greg Hewgill
> http://hewgill.com
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l