[Wikidata-bugs] [Maniphest] [Commented On] T144308: [Task] Disallow Special:GoToLinkedPage in wikidata.org/robots.txt

TheDJ Wed, 12 Oct 2016 06:55:01 -0700

TheDJ added a comment.

FYI: you disallowed crawling, that doesn't mean you disallowed indexing for modern search engines. If another indexed page links to the url, that google will still index it.

To quote

When you block URLs from being indexed in Google via robots.txt, they may still show those pages as URL only listings in their search results. A better solution for completely blocking the index of a particular page is to use a robots noindex meta tag on a per page bases. You can tell them to not index a page, or to not index a page and to not follow outbound links by inserting either of the following code bits in the HTML head of your document that you do not want indexed.

This why we have NOINDEX and setRobotPolicy on OutputPage etc.. it's just that these are redirects and/or not necessarily HTML. That's why i pointed at X-Robots-Tag.
Another thing to pay attention to, is indicating canonical urls whenever possible

TASK DETAIL

https://phabricator.wikimedia.org/T144308

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mbch331, TheDJ
Cc: Sjoerddebruin, TheDJ, Mbch331, Jonas, hoo, aude, Lydia_Pintscher, Tobi_WMDE_SW, Aklapper, thiemowmde, TerraCodes, D3r1ck01, MuhammadShuaib, Izno, Wikidata-bugs

_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Commented On] T144308: [Task] Disallow Special:GoToLinkedPage in wikidata.org/robots.txt

Reply via email to