Wasn't there some code to force the expiration? i
thought i saw something in the list.

Does anyone have a better archive of this list other
than Sourceforge? search on sourceforge really stinks
:)

thanks for the input Doug, i'm going to start
refetching some of this material and make the
recommended adjustments.

When you referr to link analysis are you talking about
ignoring the analyze db piece between segment fetching
and using what comes in via the fetch/inject process?

-byron
--- Doug Cutting <[EMAIL PROTECTED]> wrote:
> Byron Miller wrote:
> > I did the entire dmoz (not a subset) and i only
> ran
> > the link analysis as 1 iteration (couple of times
> in a
> > row) and when i did new segments i did about 6-m
> > million at a time. 
> 
> Byron,
> 
> When I look at explanations on
> http://www.mozdex.org/ I see very large 
> document boost values, which correspond to link
> analysis scores.  It 
> appears to me that the link analysis algorithm has
> somehow run amok.  I 
> wonder if you might be better off without it.
> 
> One can radically diminish the impact of link
> analysis scores on 
> searches by setting indexer.score.power to a very
> small value, e.g. 
> 0.01.  Note that you will then have to re-index,
> however.
> 
> Note that link analysis scores are also used to
> prioritize pages for 
> fetching.  So if you don't perform any link analysis
> then you'll just 
> end up doing a breadth-first crawl.
> 
> A final note: the pages you fetch initially don't
> have a good set of 
> incoming anchor texts associated with them until you
> fetch them the 
> second time.  (We don't know about links we haven't
> seen yet.)  So, when 
> you initially inject the DMOZ pages it's a good idea
> to set 
> db.default.fetch.interval to something smaller, like
> 7, so that these 
> pages will be refreshed sooner with more complete
> anchor texts.
> 
> According to research, searching incoming anchor
> text without link 
> analysis provides most of the benefits of both
> combined.  So it really 
> improves results to get good anchor texts.
> 
> Doug
> 
> 
> 
>
-------------------------------------------------------
> This SF.Net email is sponsored by Sleepycat Software
> Learn developer strategies Cisco, Motorola, Ericsson
> & Lucent use to deliver
> higher performing products faster, at low TCO.
>
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
> _______________________________________________
> Nutch-general mailing list
> [EMAIL PROTECTED]
>
https://lists.sourceforge.net/lists/listinfo/nutch-general



-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to