Seems very slow. What is your platform/OS?
I crawl 1 million pages in about an hour in most cases. I have one client i have with a huge whitelist so i'll give that a whirl and get some more numbers. When you do a crawl is it based upon injected urls or a large depth? are you running into max connection per hosts and such? do you have local dns servers that you are using for name resolution? thanks, -byron --- "Insurance Squared Inc." <[EMAIL PROTECTED]> wrote: > Would anyone care to comment on the speed of this > please? Seems awfully > long to me. > > 20 threads, a crawl took 25 hours for about 400K > URL's. It's now been > updating for 20 hours and is not yet complete. > > System: > - nutch 0.7 > - P4 2.8, 1 gig of ram > - No problems on the internet connection (I had to > throttle back the > threads open). > - We do have a pretty heavy whitelist in the regular > expression filter > for domains. > > Two days to crawl and index 400K pages is too long. > Is my answer as > simple as getting bigger hardware and paying for a > bigger pipe? > > Thanks. > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
