Seems very slow.

What is your platform/OS?

I crawl 1 million pages in about an hour in most
cases. I have one client i have with a huge whitelist
so i'll give that a whirl and get some more numbers.

When you do a crawl is it based upon injected urls or
a large depth?  are you running into max connection
per hosts and such? do you have local dns servers that
you are using for name resolution?

thanks,
-byron

--- "Insurance Squared Inc."
<[EMAIL PROTECTED]> wrote:

> Would anyone care to comment on the speed of this
> please?  Seems awfully 
> long to me.
> 
> 20 threads, a crawl took 25 hours for about 400K
> URL's.  It's now been 
> updating for 20 hours and is not yet complete.
> 
> System:
> - nutch 0.7
> - P4 2.8, 1 gig of ram
> - No problems on the internet connection (I had to
> throttle back the 
> threads open).
> - We do have a pretty heavy whitelist in the regular
> expression filter 
> for domains.
> 
> Two days to crawl and index 400K pages is too long. 
> Is my answer as 
> simple as getting bigger hardware and paying for a
> bigger pipe?
> 
> Thanks.
> 
> 
> 



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to