https://bugzilla.wikimedia.org/show_bug.cgi?id=58316

--- Comment #3 from Derk-Jan Hartman <[email protected]> ---
In summary:
* Entries in the log of apache that look like: Robinson_Can\xC3\xB3
which is a UTF-8 encoded (Likely a representation of the not percent encoded
request containing Robinson_Canó, [possibly even an IRI request?])

* Log entries are NOT canonical on this front. A request for Robinson_Canó is
logged differently then a request for Robinson_Can%C3%B3.

* The statistics of stats.grok.se might not handle these properly (collating
them, ignoring them, or just not accessible ?)

* Someone else made a tool to detect red links, that does make the \x entries
accessible/visible.

* Someone is making mass redirects of \x entries to what they consider to be
'proper' entries. This seems to cause effect in the statistics, but I would say
that if the statistics/tools are broken, you are only influencing the
statistics most likely, not per se actually fixing something

* There seems to have been a large increase of these kinds of requests (newer
browsers or google/bing.com changing their defaults can easily account for
this).

* You cannot input a utf-8 sequence in the url field of a browser (because
there is no need for this, you would just input ó).

* People can't figure out who is wrong and who is right.

Does that sum it up a bit ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to