Hi

I have changed the protocol-http plugin so that Nutch will read from local
file system, instead of from the Internet, on those already-crawled pages.
(I tried to use FILE:// protocol, but it seemed to me the interconnection
information among pages were lost). Right now, I have made it work, but
it's very slow. It took 10 minutes executing "fetch" command on 400 pages.
And I was on a 4 CPU box with 4 threads. I am wondering if this is normal,
because this is euqal to 400 hours/box to read 1 million pages, which is
>15 days.

Any suggestion will be appreciated.

Zhen

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to