On 6/20/07, Emmanuel JOKE <[EMAIL PROTECTED]> wrote: > Hi Guys, > > I have a cluster of 2 machines. I tried to crawl some website which contains > over 1M of pages. I notice that it takes fews days to complete the crawl. > The logs said 0.5p/s at 200kb/s. It seems very slow. I would like to try > Fetcher2, i guess it might improve the performance. > > It might be a stupid question but i'm wondering how to i setup my nutch to > use Fetcher2 instead of Fetcher. > Could you help me to understand ?
Are you running nutch with 'crawl' command, with seperate commands (inject, generate, fetch, etc.)or something else? If you are running seperate commands, all you have to do is change fetch to fetch2. > > Beside, what is usually the standard to configure fetcher.server.delay, I > was told that we should set this property to 1 second but i can see in > nutch-default.xml that it has been setup to 5. What is the best to do to > gain in term of performance and to stay enough polite ? That's kind of between you and the server you are fetching but I wouldn't recommend a delay lower than 5 seconds. > > More tricks to gain performance are welcome > > E > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
