Why isn't 'analyze' supported anymore? -----Original Message----- From: Andy Liu [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 02, 2005 5:44 PM To: [email protected] Subject: Re: Memory usage2
I have found that merging indexes does help performance significantly. If you're not using the cached pages for anything, I believe you can delete the /content directory for each segment and the engine should work fine (test before you try for real!) However, if you ever have to reindex the segments for whatever reason, you'll run into problems without the /content dirs. Nutch doesn't use the HITS algorithm. Nutch's analyze phase was based off of PageRank, but it's no longer supported. By default Nutch boosts documents based on the # of incoming links, which works well in small document collections, but is not a robust method in a whole-web environment. In terms of search quality, Nutch would not be able to hang with the "big dogs" of search just yet. There's still much work that needs to be done in the area of search quality and spamming. Andy ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
