https://bugzilla.wikimedia.org/show_bug.cgi?id=35746
Philippe Verdy <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #8 from Philippe Verdy <[email protected]> --- The safest way to compare page names is to pass them BOTH through {{PAGENAMEE|pagename}}, or BOTH to {{PAGENAMEE|pagename}}. If you want to also compare their namespaces, pass both pagenames in parameter to {{FULLPAGENAME|pagename}} so that the given pagename won't have its namespace parsed and removed. Note that these functions will also resolve relative paths in subpages and FULLPAGENAME(E) will also resolve the namespace. So: {{#ifeq: {{PAGENAME}}|Q & A|true|false}} will always be false on every page, but the following will work: {{#ifeq: {{PAGENAME}}|{{PAGENAME|Q & A}}|true|false}} as it will return "true" on the expected page. With full page names where you also check the namespace: {{#ifeq: {{FULLPAGENAME}}|{{FULLPAGENAME|Q & A}}|true|false}} will also return true but only in the main namespace (it will be false on a Category page named "Category:Q & A", because the second parameter of "#if" gets the full page name of page "Q & A" in te main namespace). ----- In summary: * {{(FULL|BASE|SUB)PAGENAMEE|...}} return URL-encoded names * {{(FULL|BASE|SUB)PAGENAME|...}} return HTML-encoded names There's NO function in MediaWiki that returns the raw pagename. ----- But note: {{(FULL|BASE|SUB)PAGENAMEE|...}} is also different from {{URLENCODE:{{(FULL|BASE|SUB)PAGENAME|...}}}} Because in the later case, URLENCODE will take in parameter an HTML-encoded name, so the result will be double-encoded, where HTML entities (containing the character & # ;) and SPACEs will be URL-encoded using %nn and +. But in the first case the MediaWiki-specific URL-encoding performed by PAGENAMEE is different than standard URL-encoding (it does not generate "+" for spaces, but generates underscores). So: 1. "{{PAGENAMEE|Q & A}}" returns in fact "Q_%26_A" 2. "{{PAGENAME|Q & A}}" returns in fact "Q & A" 3. "{{URLENCODE:{{PAGENAME|Q & A}}}}" returns in fact at least this: "Q+%26%2338;+A" I don't know if URLENCODE also recodes the semicolon, if so the result will be instead: "Q+%26%2338%2B+A" In all cases this will be different from the result of case 1 !!! ----- This strange behavior means that there are some characters "permitted" in URLs to MediaWiki sites that are transformed in a fery strange way, such as: 1. http://www.mediawiki.org/wiki/Q & A not directly a valid URL, but the browser transforms it to URL-encoding of UTF-8 and requests: http://www.mediawiki.org/wiki/Q%20&%20A the server all accept to load the page name "Q & A" 2. http://www.mediawiki.org/wiki/Q+%26%2338%2B+A the server parses this URL as containing an URL-encoded pagename, so it first URL-decodes it as: Q & A the server will then parse the URL and will think it contains an anchor, it will attempt to load a page named only "Q &", with the anchor "38; A" dropped ! 3. Valid page names may contain isolated ampersand or ampersands ad valdi characters in pagenames (internally they are HTML-encoded if you query their {{PAGENAME}}) but some sequences will generate errors, such as "&", but "a amp;" will be accepted... All this is completely inconsistant, but this time this does not occur in parser functions, but at the server API level when handling incoming HTTP(S) requests that may, or may not, be HTML-encoded, when the HTTP-standard says that URLs should be ONLY URL-encoded ! The server also performs such double-decoding when resolving requests. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
