hi,

thx for these informations, but since i'm using solr index, and when i make a 
search i get a blank result...
for example if i will have 10 documents as  a search result, 9 will be ok 
(because i display the title and 4 first lines of content), but i obtain one 
blank result becoz of this page (with no content and no title) ! i dont 
understans why it is in the index since it was setted as  noindex !?

here an example:

searchin  for word1:

results: 

1- title 1 : content1
2- title 1 : content2
3- title 1 : content3
4- title 1 : content4
5- title 1 : content5
6- title 1 : content6
7- title 1 : content7
8- title 1 : content8
9-    ....BLANK......
10- title 1 : content10





> From: [email protected]
> Date: Thu, 10 Dec 2009 13:33:18 -0600
> Subject: Re: NOINDEX, NOFOLLOW
> To: [email protected]
> 
> On Thu, Dec 10, 2009 at 12:22 PM, BELLINI ADAM <[email protected]> wrote:
> >
> > hi,
> >
> > i have a page with <meta name="robots" content="noindex,nofollow" />, now i 
> > know that nutch obey to this tag because i dont find the content and the 
> > title in my index, but i was wondering that this document will not be 
> > present in the index. why he keep the document in my index with no title 
> > and no content ??
> >
> > i'm using index-basic and index-more plugins, and i want to understand why 
> > nutch still filling the url, date, boost....etc since he didnt it for title 
> > and content.
> >
> > i was thinking that if nutch will obey to nofollow and noindex so it will 
> > skip all the document !
> >
> > or mabe i missunderstood something, can you plz explain this behavior to me?
> >
> > best regards.
> >
> 
> My guess is that the page is recorded to note that the page shouldn't
> be fetched, I'm guessing the status is one of the magic values.  It
> probably re-fetches the page periodically to ensure it has the list.
> So the URL and the date make sense to me as to why they populate them.
>  I don't know why it is computing the boost, other then the fact that
> it might be part of the OPIC scoring algorithm.  If the scoring
> algorithm ever uses the scores/boost of the pages that you point at as
> a contributing factor, it would make total sense.  So even though it
> doesn't index "http://example/foo/bar";, knowing which pages point
> there, and what their scores are could contribute scores of pages that
> you do index, that contain an outlink to that page.
> 
> Kirby
                                          
_________________________________________________________________
Windows Live: Keep your friends up to date with what you do online.
http://go.microsoft.com/?linkid=9691815

Reply via email to