[Nutch-dev] Time of Reading Local Files

Jane Zhen Mon, 18 Sep 2006 06:36:33 -0700

Hi Folks,

I have crawled a million pages before, and right now I would like to read 
from crawled local files instead of from the Internet again. I changed the 
http plugin to do so, but the speed is quite slow - it took 10 minutes to 
read and parse( I was running "fetch" command) only 400 files/pages. This 
means reading 1 million will take 400 hours, which is half-a- month. I used 
4 threads on a 4 CPU box. Using more threads, like 8, made it even slower.


Let me know if this speed is reasonable. And what can I do to improve this?

thanks

Zhen

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Time of Reading Local Files

Reply via email to