[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 Krinkle krinklem...@gmail.com changed: What|Removed |Added Version|1.16.0 |1.16.x -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 --- Comment #9 from Kiran Jonnalagadda j...@pobox.com 2010-10-31 11:47:34 UTC --- Thank you, Daniel. This is a great explanation. However, the exact same problem exists without the patch, in MediaWiki's default configuration with ugly URLs. Consider the URLs it would generate for an installation in /wiki: /wiki/index.php?title=Example_page /wiki/index.php?title=Example_pagemonth=11year=2010 There is no way to use robots.txt to disallow the second URL without blocking the first. This, therefore, is a problem for the Semantic Result Formats extension to fix. It is not introduced by this patch. As far as I can tell, there is no use case within MediaWiki's default configuration that touches line 824 of Title.php -- there is never a URL that includes a query but not an action parameter (apart from the default 'view' action, which also bypasses line 824). Does MediaWiki come with unit tests to verify this? I'm new to the source. It appears that the patch touches an area of code that is rarely used and hence has been long overlooked. The indexing problem it potentially causes with SRF Calendar already exists without the patch. The patch only makes the function behave consistently. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 --- Comment #10 from Kiran Jonnalagadda j...@pobox.com 2010-10-31 12:05:31 UTC --- One possible solution is to define a new 'null' or 'none' action in $wgActionPaths that gets looked up when no action is called for (this being different from the default 'view' action). getLocalURL can then use the installation's preferred URL syntax instead of assuming the use of pretty or ugly URLs. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 --- Comment #11 from Kiran Jonnalagadda j...@pobox.com 2010-10-31 12:49:33 UTC --- As far as I can tell, there is no use case within MediaWiki's default configuration that touches line 824 of Title.php -- there is never a URL that includes a query but not an action parameter (apart from the default 'view' action, which also bypasses line 824). Whoops. Completely wrong here. History browsing uses query parameters without an action. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 --- Comment #12 from Kiran Jonnalagadda j...@pobox.com 2010-10-31 13:06:54 UTC --- Whoops. Completely wrong here. History browsing uses query parameters without an action. ... and those URLs are generated without touching line 824 of Title.php, so the presence of this patch makes no difference. From the documentation for getLocalURL: http://svn.wikimedia.org/doc/classTitle.html#a75d9fae7aabf6187318c5a298b01c5ef $queryMixed: an optional query string; if not specified, $wgArticlePath will be used. So the documentation is specific: $wgArticlePath is only used when no query is specified. This patch does the logically expected thing, but violates the documented expectation. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 --- Comment #13 from Daniel Friesen mediawiki-b...@nadir-seen-fire.com 2010-10-31 14:38:31 UTC --- I just took an actual look at the patch and larger look at the code, I see two things staring at me... Firstly, it feels for some reason to me that something is wrong with the $variant stuff... I'm not quite up to speed on the variant stuff, so it's either a MW bug where variant handling that should be in the second block isn't there. Or variant handling is supposed to be done only for /short/Urls and the patch adds a new location where /short/Urls end up showing up but is missing variant handling code. Or it's only meant to happen when $query is empty and I'm just to far behind. Though that's probably less of an issue compared to this. Looking at Line 821 I see what seams to be part of a feature that appears to allow you to pass - to getLocalURL as a method of saying to getLocalURL I don't have any query, but I want you to give me a long index.php style url because I have a case where getting short urls will cause chaos, so NO short urls, although it does look like a tweak would be nice so there is no trailing . The way the patch is written seams to break that... if this patch is applied it looks like $query = - will start outputting short urls when getLocalURL was explicitly asked to output long urls, and additionally this will be done without variant handling that would have otherwise been done. Also now that I notice the comment, the title= showing up in a short url does not look like proper implementation of a feature, even without the other issues I'd reject the patch till that was fixed. Re-reading comments again, it looks to me you have a lingering misunderstanding of $wgArticlePath, $wgActionPaths and what those blocks of code actually do. Firstly of course, that note on how in default config robots.txt can't be used is moot, the robots.txt stuff is an advanced configuration for advanced robots optimized shorturl style configuration and for the reasons you describe requires use of short urls to function. Just because a default configuration doesn't support it doesn't mean that it should be broken universally (this patch) when it is only supported in an advanced configuration. I should also point out that if MediaWiki detects that the server is able to support it, MediaWiki actually sets up $wgArticlePath to work in a /index.php/Article form, so robots.txt IS actually usable by default on the right server by disallowing /index.php?. Now for the actual misunderstanding you commented As far as I can tell, there is no use case within MediaWiki's default configuration that touches line 824 of Title.php -- there is never a URL that includes a query but not an action parameter (apart from the default 'view' action, which also bypasses line 824). As I understand from this -- even taking your later comment on history into account -- you believe that calls to getLocalURL containing a query with an action parameter are always caught by the chunk of code between lines 807-819. That is incorrect. The block of code there is ONLY ever applied if the non-default $wgActionPaths (NOT $wgArticlePath, but $wgActionPaths) which in fact there are very very few MediaWiki installations that actually enable. So in actuality the code in lines 810-817 which does things with getLocalURL calls that have a action= in the query is actually almost NEVER run, at least not unless the wiki is one of the rare few wiki that have specially configured $wgActionPaths. The line 824 you believe is being bypassed by default in fact is actually NEVER EVER bypassed by default. By default config line 824 handles every single call to getLocalURL which contains anything at all inside the $query. And to re-iterate, the expected behavior for getLocalURL and the like is that urls with no query, and not marked with a faux - query to disallow shorturls will have a shorturl returned if possible. While any url with a query is ALWAYS returned in long index.php form. Changing things so that things that currently output long urls output short urls will break things. I forgot about it, but there is another reason why that /w/ robots.txt trick is used. The nature of most pages that use queries is that they are dynamic. In other words they likely contain links within them which could cause search engines to endlessly spider dynamically generated content that the wiki does not want them to spider (ie: search queries). So changing these to short urls may cause dynamic things to suddenly start being spidered by search engines when they were originally intended not to. Because of this, I believe that ANY feature we add that allows getLocalURL to return short urls with queries instead of long urls MUST be an explicit opt-in where getLocalURL is passed with an extra optional argument telling it that shorturls with queries are ok and that any code changed to use this be done by someone understanding
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 --- Comment #14 from Kiran Jonnalagadda j...@pobox.com 2010-10-31 14:56:45 UTC --- Thank you for the most excellent comment, Daniel. getLocalURL is far too core to MediaWiki to make changes without understanding the implications thoroughly, which I clearly don't. I propose this ticket be marked INVALID. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 Bryan Tong Minh bryan.tongm...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||WONTFIX -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 Reedy s...@reedyboy.net changed: What|Removed |Added Keywords|easy|need-review, patch Summary|API call Title-getLocalURL |Title-getLocalURL does not |does not respect|respect $wgArticlePath when |$wgArticlePath when given a |given a query |query | Severity|enhancement |minor -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 Niklas Laxström niklas.laxst...@gmail.com changed: What|Removed |Added CC||niklas.laxst...@gmail.com --- Comment #2 from Niklas Laxström niklas.laxst...@gmail.com 2010-10-30 19:05:32 UTC --- I think this is by design. If you want prettier urls you can use $wgActionPaths. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 Bryan Tong Minh bryan.tongm...@gmail.com changed: What|Removed |Added CC||bryan.tongm...@gmail.com --- Comment #3 from Bryan Tong Minh bryan.tongm...@gmail.com 2010-10-30 19:13:09 UTC --- This breaks urls like action=raw, which can only be used in ugly mode. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 Daniel Friesen mediawiki-b...@nadir-seen-fire.com changed: What|Removed |Added CC||mediawiki-b...@nadir-seen-f ||ire.com --- Comment #4 from Daniel Friesen mediawiki-b...@nadir-seen-fire.com 2010-10-30 19:38:44 UTC --- Additionally I believe this is by design so that action=edit urls on wiki that are configured like Wikipedia can be made to always have /w/index.php style urls so they can be blanket blocked by robots.txt /wiki/Article?action=edit can't adequately be covered in robots.txt and while noindex is on the page itself it's much better if search engines can be prevented from even waisting bandwidth by loading urls they are just going to get a noindex on. So you'll either have to use $wgActionPaths or start a bug asking for some sort of $wgShortActions array which can be loaded with an explicit list of actions that should take the /wiki/Article?action= form instead of the default index.php?title=action= form. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 --- Comment #5 from Kiran Jonnalagadda j...@pobox.com 2010-10-30 23:03:04 UTC --- getLocalURL() already has special handling for when it sees 'action' in the query parameters. This bug report and patch are for queries that do not specify an action. Therefore, $wgActionPaths does not apply here and action=raw isn't affected. Here is where 'action' is checked for: http://svn.wikimedia.org/viewvc/mediawiki/tags/REL1_16_0/phase3/includes/Title.php?view=markup#l804 The patch assumes $wgArticlePath is setup for pretty URLs. This may or may not be handled by the wfAppendQuery() function called elsewhere from within getLocalURL(). I'm not sure where wfAppendQuery() is defined. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 Platonides platoni...@gmail.com changed: What|Removed |Added CC||platoni...@gmail.com --- Comment #6 from Platonides platoni...@gmail.com 2010-10-30 23:15:16 UTC --- If you change an url like /w/index.php?title=Opinion_calendarmonth=11year=2010 to /wiki/Opinion_calendar?month=11year=2010 Then robots.txt blocking /w/ won't apply and a spider will follow the whole calendar. I think this is a won't fix. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 --- Comment #7 from Kiran Jonnalagadda j...@pobox.com 2010-10-30 23:30:18 UTC --- I've confirmed that the patch does the right thing even when using ugly URLs. Platonides has a very good point. Maybe rel=nofollow applies there? -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 25718] Title-getLocalURL does not respect $wgArticlePath when given a query
https://bugzilla.wikimedia.org/show_bug.cgi?id=25718 --- Comment #8 from Daniel Friesen mediawiki-b...@nadir-seen-fire.com 2010-10-31 02:43:32 UTC --- Even if you rel=nofollow a url, it's possible that someone may use WikiText to link to the same url and this link may not contain rel=nofollow, it's also possible for the external link to be from another website, so there is no way to ensure that there are never any non-nofollow links sitting around pointing to urls that you don't want search engines to spider. rel=nofollow also does not stop spiders from following a url, spiders have taken more of a stance lately of reading rel=nofollow as I don't endorse this, don't count it towards a pagerank rather than NEVER use this link to go to that url. Even if you rel=nofollow something they may still follow the link to that page, so the only way to keep a spider from ever spidering a url is to use robots.txt it's the only reliable tool (We know rel=noindex is used on the pages themselves, but who wants search engines spidering every noindex'ed page they can see, it's a waste of requests on high traffic wiki). -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l