>Several days for 120,000 pages? That's very slow. Could you show some
status lines in the log file? (grep "status:") What's the bandwidth you
have?

AJ,

I mean: I haven't tried to run "-depth 20", I run "-depth 6" and crawled
21,000 pages for 7-8 hours... I mirrored 120,000 pages from www.apache.org
usig Teleport Ultra, total about 10 hours for this crawl (8mbps download, 10
threads);

During 3 tests I crawled (each time) 21,000 pages from _local_ web-site (in
the same LAN segment, 100mbps); existing plugins required 8 hours per 21,000
pages, so I couldn't try 120,000 pages...




-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to