https://bugzilla.wikimedia.org/show_bug.cgi?id=47733

       Web browser: ---
            Bug ID: 47733
           Summary: Word count is wrong, does not recognize non-ASCII
                    characters
           Product: MediaWiki extensions
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: minor
          Priority: Unprioritized
         Component: ArticleFeedbackv5
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
    Classification: Unclassified
   Mobile Platform: ---

The following example counts 42 words. But I count 40 words.

http://de.wikipedia.org/wiki/Spezial:Artikelr%C3%BCckmeldungen_v5/Yellowstone-Nationalpark/04f917900607eb1692a1842b2b77d79c

I think the current count searches for words made of the letters a to z.
Because of this a German word like "schönen" is counted as two words.

The best solution would be to use \p{L} instead of \w or [a-z] in the regular
expression. Please note that this does not work in JavaScript.

http://www.regular-expressions.info/unicode.html

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to