I have a question about the proper interpretation of a noindex robots
directive in a meta tag (<meta name="robots" content="noindex" />).

When Nutch fetches such a page, the content, title, etc. of the page
is not indexed, but the URL itself is.  The document is searchable by
terms in the URL.  That is, if the URL of the page is
http://www.mysite.com/onepage.html, the page is be returned as a hit
when searching "onepage".

Is it correct that Nutch does not index the content but still created
a Lucene document for a page with such a directive?  Intuitively it
seems to me as if it should not be searchable at all.

Thanks,
Charlie

Reply via email to