I have a question about the proper interpretation of a noindex robots directive in a meta tag (<meta name="robots" content="noindex" />).
When Nutch fetches such a page, the content, title, etc. of the page is not indexed, but the URL itself is. The document is searchable by terms in the URL. That is, if the URL of the page is http://www.mysite.com/onepage.html, the page is be returned as a hit when searching "onepage". Is it correct that Nutch does not index the content but still created a Lucene document for a page with such a directive? Intuitively it seems to me as if it should not be searchable at all. Thanks, Charlie
