Why isn't 'analyze' supported anymore?

-----Original Message-----
From: Andy Liu [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 02, 2005 5:44 PM
To: [email protected]
Subject: Re: Memory usage2

I have found that merging indexes does help performance significantly.

If you're not using the cached pages for anything, I believe you can
delete the /content directory for each segment and the engine should
work fine (test before you try for real!)  However, if you ever have
to reindex the segments for whatever reason, you'll run into problems
without the /content dirs.

Nutch doesn't use the HITS algorithm.  Nutch's analyze phase was based
off of PageRank, but it's no longer supported.  By default Nutch
boosts documents based on the # of incoming links, which works well in
small document collections, but is not a robust method in a whole-web
environment.  In terms of search quality, Nutch would not be able to
hang with the "big dogs" of search just yet.  There's still much work
that needs to be done in the area of search quality and spamming.

Andy



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to