Summary: Abuse filter appears to mishandle unicode
           Product: MediaWiki extensions
           Version: any
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: Normal
         Component: AbuseFilter

In analyzing a false positive, I've been trying to track down the reason my
regex debugger says a regex doesn't match yet it does match on the abuse
filter. Eventually I found what appears to be a good lead on the issue.

Details of the incorrect match are here:

It appears what's going on is the é (which appears to be encoded in UTF-8) is
mishandled when testing against the regex. It appears to the regex engine as a
word boundary, so the match succeeds (specifically, "\brence\b" matches

Hopefully there's a way to correct this and it's not a problem in the heart of
PHP instead.

Please let me know if you need any additional information.

-- Shirik @ enwiki

Configure bugmail:
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Wikibugs-l mailing list

Reply via email to