>Several days for 120,000 pages? That's very slow. Could you show some status lines in the log file? (grep "status:") What's the bandwidth you have?
AJ, I mean: I haven't tried to run "-depth 20", I run "-depth 6" and crawled 21,000 pages for 7-8 hours... I mirrored 120,000 pages from www.apache.org usig Teleport Ultra, total about 10 hours for this crawl (8mbps download, 10 threads); During 3 tests I crawled (each time) 21,000 pages from _local_ web-site (in the same LAN segment, 100mbps); existing plugins required 8 hours per 21,000 pages, so I couldn't try 120,000 pages... ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
