https://bugzilla.wikimedia.org/show_bug.cgi?id=62468
--- Comment #4 from Nathan Larson <[email protected]> --- I suspect most MediaWiki installations probably have robots.txt set up, as recommended at [[mw:Manual:Robots.txt#With_short_URLs]], with User-agent: * Disallow: /w/ See for example: *https://web.archive.org/web/20140310075905/http://en.wikipedia.org/w/index.php?title=Main_Page&action=edit * https://web.archive.org/web/20140310075905/http://en.wikipedia.org/w/index.php?title=Main_Page&action=raw So, they couldn't retrieve action=raw even if they wanted to. In fact, if I were to set up a script to download it, might I not be in violation of robots.txt, which would make my script an ill-behaving bot? I'm not sure my moral fiber can handle an ethical breach of that magnitude. However, some sites do allow indexing of their edit and raw pages, e.g. https://web.archive.org/web/20131204083339/https://encyclopediadramatica.es/index.php?title=Wikipedia&action=edit https://web.archive.org/web/20131204083339/https://encyclopediadramatica.es/index.php?title=Wikipedia&action=raw https://web.archive.org/web/20131012144928/http://rationalwiki.org/w/index.php?title=Wikipedia&action=edit https://web.archive.org/web/20131012144928/http://rationalwiki.org/w/index.php?title=Wikipedia&action=raw Dramatica and RationalWiki use all kinds of secret sauces, though, so who knows what's going on there. Normally, edit pages have a <meta name="robots" content="noindex,nofollow" /> but that's not the case with Dramatica or RationalWiki edit pages. Is there some config setting or extension that changes the robot policy on edit pages? Also, I wonder if they had to tell the Internet Archive to archive those pages, or if the Internet Archive just did it on its own initiative. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
