On Mon, Jan 16, 2012 at 15:10, Ronald Chmara <[email protected]> wrote: > http://lists.pdxlinux.org/pipermail/plug/ > <META NAME="robots" CONTENT="noindex,follow"> > > http://lists.pdxlinux.org/pipermail/plug/2012-January/thread.html > <META NAME="robots" CONTENT="noindex,follow"> > > http://lists.pdxlinux.org/pipermail/plug/2012-January/074836.html > <META NAME="robots" CONTENT="index,nofollow">
This looks okay to me. From what I read, it's saying to not index the index pages, but do follow them. When it hits an actual message, it's saying to index them but don't follow any included links. Perhaps it's not able to find the original index pages because nothing links to them that it can follow that's in its database. Digging... http://pdxlinux.org/robots.txt has disallow entries for /mailman/ and /pipermail/, but this applies only to pdxlinux.org domain(?), not lists.pdxlinux.org. The email archives are also linked to directly from pdxlinux.org/mail/, which is crawlable. Digging... The two messages from October that Google did index are showing up because they're linked to from a web history of the #orlug IRC channel. This was the only place they were linked from according to Google: http://home.borked.us/irc/urllog.shtml Bing shows only 12 messages from the mailing list archive in all of 2011. Bing, however also doesn't show much for previous years (159 pages indexed for all years...they lie an say 1,510 results until you get to page 9 and then it changes to 159). Anyone else with ideas? Is there a place someone could've submitted a URL to tell Google we don't want to be indexed anymore, or is it possible they just haven't got around to doing it in a while? _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
