https://bugzilla.wikimedia.org/show_bug.cgi?id=47733

--- Comment #1 from Matthias Mullie <[email protected]> ---
ß & ö are indeed the culprits.
PHP's native str_word_count is used, which isn't mb-safe.
However, using a regex matching chars (with diacritics) is not ideal either,
since that would count words like "you're" or hyphenated words (and quite
possibly in other languages other combinations with other characters) as
multiple words. So that would be substituting 1 bad solution for another
sub-optimal solution.
Perhaps we should split based on whitespace, remove all occurences without
letters, and count that number?

Besides, the character length is wrong too, but switching strlen for mb_strlen
should do the trick.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to