I'm glad you got the slowness issue straightened out. When you import the dmoz urls into your Nutch DB, the "-subset" command isn't really meant to limit the size of your fetch lists. This becomes even more true when you start re-fetching. You can actually skip the subset command and allow all of them to go in, unless you have your own custom filtering method/requirement. You should use the "-topN" command instead when you generate your segment file. This will create a segment with an exact number of urls. Below are examples of creating a segment with 1 million urls to fetch for each Nutch architecture; (Nutch 0.7) bin/nutch generate db segments -topN 1000000
(Nutch 0.8+) bin/nutch generate crawl/crawldb crawl/segments -topN 1000000 ----- Original Message ---- From: shrinivas patwardhan <[EMAIL PROTECTED]> To: nutch-user@lucene.apache.org Sent: Tuesday, January 2, 2007 4:25:13 AM Subject: Re: fetcher : some doubts thank you Sean Dean that sounds good .. i will try it out . tell me if i am rite : i case of a dmoz index file is injected in the db .. then i generate only few segments by using -subset and then fetch them .. and then go on and generate the next set of segments i hope i am heading the right way and for the previous problem of the searching being slow .. it wasnt my hardware but my segments were corrupt i fixed them and the search runs fine now Thanks & Regards Shrinivas Patwardhan