https://bugzilla.wikimedia.org/show_bug.cgi?id=47733
Web browser: ---
Bug ID: 47733
Summary: Word count is wrong, does not recognize non-ASCII
characters
Product: MediaWiki extensions
Version: master
Hardware: All
OS: All
Status: NEW
Severity: minor
Priority: Unprioritized
Component: ArticleFeedbackv5
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Classification: Unclassified
Mobile Platform: ---
The following example counts 42 words. But I count 40 words.
http://de.wikipedia.org/wiki/Spezial:Artikelr%C3%BCckmeldungen_v5/Yellowstone-Nationalpark/04f917900607eb1692a1842b2b77d79c
I think the current count searches for words made of the letters a to z.
Because of this a German word like "schönen" is counted as two words.
The best solution would be to use \p{L} instead of \w or [a-z] in the regular
expression. Please note that this does not work in JavaScript.
http://www.regular-expressions.info/unicode.html
--
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l