--- Comment #9 from Philippe Verdy <> ---
See also bug 35628 about the weird way the various parser functions interpret
(or not) their input (URL-decoding, HTML-decoding, sometimes mixed up!), and
how they may or may not reencode their output.

If this was not already complex within ASCII only, it becomes a nightmare with
non-ASCII characters not because they are UTF-8 encoded, this is a convention)
but because non-ASCII bytes (which may represent UTF-8 sequences of a single
character... or not, because MediaWiki accepts invalid Unicode characters such
as U+FFFF when they are pseudo-encoded as UTF-8, and then URL-encoded using %nn
hex sequences ! On the API level, any %xx encoded byte is accepted, but the
UTF-8 encoding is in fact not enforced.

The server just treats *raw* sequences of bytes (filtering only some ASCII
characters, but not restricring at all the range of bytes in 0x80 to 0xFF, and
not restricting later the range of 16-bit code units in the full range 0x0020
to 0xFFFF (when they are used in various libraries working with UTF-16 instead
of real 21-bit code points.

I wonder how this inconsistency could defeat some security restrictions such as
violating access rights on blocked pages. It is possible that one could create
some weird page names via the HTTP API that will later not be accessible from
any other MEdiaWiki page, or from Wiki administrtors with their online tools.
and someone could maliciously create those weird page names to fill in a
category or some generated MediaWiki pages that list pages in categories.

Possibly a user could also create a user account with such weird name and have
his user page name inaccessible from standard blocking tools.

And CheckUser admmins may have difficulty to read logs and find the relevat

You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
Wikibugs-l mailing list

Reply via email to