https://bugzilla.wikimedia.org/show_bug.cgi?id=22761
Summary: Abuse filter appears to mishandle unicode Product: MediaWiki extensions Version: any Platform: All OS/Version: All Status: NEW Severity: major Priority: Normal Component: AbuseFilter AssignedTo: agarr...@wikimedia.org ReportedBy: delbu...@my.erau.edu CC: wikibugs-l@lists.wikimedia.org In analyzing a false positive, I've been trying to track down the reason my regex debugger says a regex doesn't match yet it does match on the abuse filter. Eventually I found what appears to be a good lead on the issue. Details of the incorrect match are here: http://test.wikipedia.org/w/index.php?title=Special:AbuseLog&details=1784 It appears what's going on is the é (which appears to be encoded in UTF-8) is mishandled when testing against the regex. It appears to the regex engine as a word boundary, so the match succeeds (specifically, "\brence\b" matches "conférence"). Hopefully there's a way to correct this and it's not a problem in the heart of PHP instead. Please let me know if you need any additional information. -- Shirik @ enwiki -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l