-addays is what i was looking for :) Giving this a "whirl" now!
-byron --- Byron Miller <[EMAIL PROTECTED]> wrote: > Wasn't there some code to force the expiration? i > thought i saw something in the list. > > Does anyone have a better archive of this list other > than Sourceforge? search on sourceforge really > stinks > :) > > thanks for the input Doug, i'm going to start > refetching some of this material and make the > recommended adjustments. > > When you referr to link analysis are you talking > about > ignoring the analyze db piece between segment > fetching > and using what comes in via the fetch/inject > process? > > -byron > --- Doug Cutting <[EMAIL PROTECTED]> wrote: > > Byron Miller wrote: > > > I did the entire dmoz (not a subset) and i only > > ran > > > the link analysis as 1 iteration (couple of > times > > in a > > > row) and when i did new segments i did about 6-m > > > million at a time. > > > > Byron, > > > > When I look at explanations on > > http://www.mozdex.org/ I see very large > > document boost values, which correspond to link > > analysis scores. It > > appears to me that the link analysis algorithm has > > somehow run amok. I > > wonder if you might be better off without it. > > > > One can radically diminish the impact of link > > analysis scores on > > searches by setting indexer.score.power to a very > > small value, e.g. > > 0.01. Note that you will then have to re-index, > > however. > > > > Note that link analysis scores are also used to > > prioritize pages for > > fetching. So if you don't perform any link > analysis > > then you'll just > > end up doing a breadth-first crawl. > > > > A final note: the pages you fetch initially don't > > have a good set of > > incoming anchor texts associated with them until > you > > fetch them the > > second time. (We don't know about links we > haven't > > seen yet.) So, when > > you initially inject the DMOZ pages it's a good > idea > > to set > > db.default.fetch.interval to something smaller, > like > > 7, so that these > > pages will be refreshed sooner with more complete > > anchor texts. > > > > According to research, searching incoming anchor > > text without link > > analysis provides most of the benefits of both > > combined. So it really > > improves results to get good anchor texts. > > > > Doug > > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by Sleepycat > Software > > Learn developer strategies Cisco, Motorola, > Ericsson > > & Lucent use to deliver > > higher performing products faster, at low TCO. > > > http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 > > _______________________________________________ > > Nutch-general mailing list > > [EMAIL PROTECTED] > > > https://lists.sourceforge.net/lists/listinfo/nutch-general > > > > ------------------------------------------------------- > This SF.Net email is sponsored by Sleepycat Software > Learn developer strategies Cisco, Motorola, Ericsson > & Lucent use to deliver > higher performing products faster, at low TCO. > http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 > _______________________________________________ > Nutch-general mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/nutch-general ------------------------------------------------------- This SF.Net email is sponsored by Sleepycat Software Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver higher performing products faster, at low TCO. http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
