Hi Folks, I have crawled a million pages before, and right now I would like to read from crawled local files instead of from the Internet again. I changed the http plugin to do so, but the speed is quite slow - it took 10 minutes to read and parse( I was running "fetch" command) only 400 files/pages. This means reading 1 million will take 400 hours, which is half-a- month. I used 4 threads on a 4 CPU box. Using more threads, like 8, made it even slower.
Let me know if this speed is reasonable. And what can I do to improve this? thanks Zhen _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
