hi, thx for these informations, but since i'm using solr index, and when i make a search i get a blank result... for example if i will have 10 documents as a search result, 9 will be ok (because i display the title and 4 first lines of content), but i obtain one blank result becoz of this page (with no content and no title) ! i dont understans why it is in the index since it was setted as noindex !?
here an example: searchin for word1: results: 1- title 1 : content1 2- title 1 : content2 3- title 1 : content3 4- title 1 : content4 5- title 1 : content5 6- title 1 : content6 7- title 1 : content7 8- title 1 : content8 9- ....BLANK...... 10- title 1 : content10 > From: [email protected] > Date: Thu, 10 Dec 2009 13:33:18 -0600 > Subject: Re: NOINDEX, NOFOLLOW > To: [email protected] > > On Thu, Dec 10, 2009 at 12:22 PM, BELLINI ADAM <[email protected]> wrote: > > > > hi, > > > > i have a page with <meta name="robots" content="noindex,nofollow" />, now i > > know that nutch obey to this tag because i dont find the content and the > > title in my index, but i was wondering that this document will not be > > present in the index. why he keep the document in my index with no title > > and no content ?? > > > > i'm using index-basic and index-more plugins, and i want to understand why > > nutch still filling the url, date, boost....etc since he didnt it for title > > and content. > > > > i was thinking that if nutch will obey to nofollow and noindex so it will > > skip all the document ! > > > > or mabe i missunderstood something, can you plz explain this behavior to me? > > > > best regards. > > > > My guess is that the page is recorded to note that the page shouldn't > be fetched, I'm guessing the status is one of the magic values. It > probably re-fetches the page periodically to ensure it has the list. > So the URL and the date make sense to me as to why they populate them. > I don't know why it is computing the boost, other then the fact that > it might be part of the OPIC scoring algorithm. If the scoring > algorithm ever uses the scores/boost of the pages that you point at as > a contributing factor, it would make total sense. So even though it > doesn't index "http://example/foo/bar", knowing which pages point > there, and what their scores are could contribute scores of pages that > you do index, that contain an outlink to that page. > > Kirby _________________________________________________________________ Windows Live: Keep your friends up to date with what you do online. http://go.microsoft.com/?linkid=9691815
