>From what i know the analyze db was a black box that no one has touched in
a LONG time.

I spent a lot of time looking at the distributed webdb in an effort to try
and speed up this process (as it took forever on a 150 million page db and
way too much disk space) but found it easier to just do what Doug recommended.

Perhaps an analyzer that can take all of our tweaks and provide an
interface to save "templates" that could be used may be a neat idea. (or
even plugins for the analyzer to do unique processing)

-byron

-----Original Message-----
From: Andrzej Bialecki <[EMAIL PROTECTED]>
To: [email protected]
Date: Sun, 22 May 2005 00:37:55 +0200
Subject: Re: Hardware requirements and some other questions about Nutch

> Byron Miller wrote:
> > Here is what the great Doug said:
> > 
> > "
> > Are you using link analysis? Perhaps it is doing you a disservice by
> > prioritizing one site above the others. Try, in place of the analyze
> > command, setting setting both fetchlist.score.by.link.count and
> > indexer.boost.by.link.count to true. Please tell us how that works
> for you.
> > 
> > Doug"
> > 
> > I did this and haven't ran analyze since then and you can see the
> results
> > on mozdex.com looking pretty good!
> 
> Both methods boost up well-connected pages, and penalize 
> poorly-connected ones. However, if I understand this correctly the 
> implications of using this method instead of DB analysis are the
> following:
> 



Reply via email to