Dear List, My process fetches only 10 but very big domains with millions of pages on each site. I now wonder way I got after 2 weeks and 17 crawl-fetch cycles only a handful of about 30,000 pages and it seems stagnating.
How would you accelerate fetching? My current parameters (using Nutch-1.2): topN: 40,000 depth: 8 adddays: 30 fetcher.server.delay: 1 db.max.outlinks.per.page: 500 All parameters not mentioned have standard values as well as regex-urlfilter.txt. Best Regards Thomas ________________________________ GfK SE, Nuremberg, Germany, commercial register Nuremberg HRB 25014; Management Board: Professor Dr. Klaus L. W?bbenhorst (CEO), Pamela Knapp (CFO), Dr. Gerhard Hausruckinger, Petra Heinlein, Debra A. Pruent, Wilhelm R. Wessels; Chairman of the Supervisory Board: Dr. Arno Mahlert This email and any attachments may contain confidential or privileged information. Please note that unauthorized copying, disclosure or distribution of the material in this email is not permitted.

