https://bugzilla.wikimedia.org/show_bug.cgi?id=47733
--- Comment #1 from Matthias Mullie <[email protected]> --- ß & ö are indeed the culprits. PHP's native str_word_count is used, which isn't mb-safe. However, using a regex matching chars (with diacritics) is not ideal either, since that would count words like "you're" or hyphenated words (and quite possibly in other languages other combinations with other characters) as multiple words. So that would be substituting 1 bad solution for another sub-optimal solution. Perhaps we should split based on whitespace, remove all occurences without letters, and count that number? Besides, the character length is wrong too, but switching strlen for mb_strlen should do the trick. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
