https://bugzilla.wikimedia.org/show_bug.cgi?id=70657
Bug ID: 70657
Summary: Same page served with two different adresses, with two
different rel canonical
Product: Wikimedia
Version: unspecified
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: trivial
Priority: Unprioritized
Component: Language setup
Assignee: [email protected]
Reporter: [email protected]
Web browser: ---
Mobile Platform: ---
I think I found two intricated "bugs":
== Wikipedia accept invalid URI in HTTP requests ==
According to some URI's RFC like 2396, 3986: "A URI is a sequence of characters
from a very limited set, i.e. the letters of the basic Latin alphabet, digits,
and a few special characters."
I'm aware of URI variants like IRI allowing any byte sequences, BUT the HTTP
RFC specifies that HTTP accepts URIs, not IRIs. This does NOT render IRI
useless, we still can use IRI on browsers, whose role is to convert to valid
URIs (With the knowledge of the local encoding).
So this may fail, typically with a 400 bad request, instead of returning a 200
OK:
$ curl -si http://ar.wikipedia.org/wiki/حب | grep 'canonical\|HTTP/1.1'
HTTP/1.1 200 OK
<link rel="canonical" href="http://ar.wikipedia.org/wiki/حب" />
But I think if Wikipedia returns a 200, there may be a reason, and I think this
ticket is a good opportunity do document it.
== Due to previous bug, Wikipedia have the same page behind two different URIs
with two different rel-canonical ==
$ urlencode 'حب'
%D8%AD%D8%A8
$ curl -si http://ar.wikipedia.org/wiki/%D8%AD%D8%A8 | grep
'canonical\|HTTP/1.1'
HTTP/1.1 200 OK
<link rel="canonical" href="http://ar.wikipedia.org/wiki/%D8%AD%D8%A8" />
And I think this one is typically not normal, rel canonical should be I think
set to the encoded (valid) form when requesting the invalid URI, if no 400 is
given.
--
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l