-addays is what i was looking for :)

Giving this a "whirl" now!

-byron

--- Byron Miller <[EMAIL PROTECTED]> wrote:
> Wasn't there some code to force the expiration? i
> thought i saw something in the list.
> 
> Does anyone have a better archive of this list other
> than Sourceforge? search on sourceforge really
> stinks
> :)
> 
> thanks for the input Doug, i'm going to start
> refetching some of this material and make the
> recommended adjustments.
> 
> When you referr to link analysis are you talking
> about
> ignoring the analyze db piece between segment
> fetching
> and using what comes in via the fetch/inject
> process?
> 
> -byron
> --- Doug Cutting <[EMAIL PROTECTED]> wrote:
> > Byron Miller wrote:
> > > I did the entire dmoz (not a subset) and i only
> > ran
> > > the link analysis as 1 iteration (couple of
> times
> > in a
> > > row) and when i did new segments i did about 6-m
> > > million at a time. 
> > 
> > Byron,
> > 
> > When I look at explanations on
> > http://www.mozdex.org/ I see very large 
> > document boost values, which correspond to link
> > analysis scores.  It 
> > appears to me that the link analysis algorithm has
> > somehow run amok.  I 
> > wonder if you might be better off without it.
> > 
> > One can radically diminish the impact of link
> > analysis scores on 
> > searches by setting indexer.score.power to a very
> > small value, e.g. 
> > 0.01.  Note that you will then have to re-index,
> > however.
> > 
> > Note that link analysis scores are also used to
> > prioritize pages for 
> > fetching.  So if you don't perform any link
> analysis
> > then you'll just 
> > end up doing a breadth-first crawl.
> > 
> > A final note: the pages you fetch initially don't
> > have a good set of 
> > incoming anchor texts associated with them until
> you
> > fetch them the 
> > second time.  (We don't know about links we
> haven't
> > seen yet.)  So, when 
> > you initially inject the DMOZ pages it's a good
> idea
> > to set 
> > db.default.fetch.interval to something smaller,
> like
> > 7, so that these 
> > pages will be refreshed sooner with more complete
> > anchor texts.
> > 
> > According to research, searching incoming anchor
> > text without link 
> > analysis provides most of the benefits of both
> > combined.  So it really 
> > improves results to get good anchor texts.
> > 
> > Doug
> > 
> > 
> > 
> >
>
-------------------------------------------------------
> > This SF.Net email is sponsored by Sleepycat
> Software
> > Learn developer strategies Cisco, Motorola,
> Ericsson
> > & Lucent use to deliver
> > higher performing products faster, at low TCO.
> >
>
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
> > _______________________________________________
> > Nutch-general mailing list
> > [EMAIL PROTECTED]
> >
>
https://lists.sourceforge.net/lists/listinfo/nutch-general
> 
> 
> 
>
-------------------------------------------------------
> This SF.Net email is sponsored by Sleepycat Software
> Learn developer strategies Cisco, Motorola, Ericsson
> & Lucent use to deliver
> higher performing products faster, at low TCO.
>
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
> _______________________________________________
> Nutch-general mailing list
> [EMAIL PROTECTED]
>
https://lists.sourceforge.net/lists/listinfo/nutch-general



-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to