Dear List,

My process fetches only 10 but very big domains with millions of pages on each 
site. I now wonder way I got after 2 weeks and 17 crawl-fetch cycles only a 
handful of about 30,000 pages and it seems stagnating.

How would you accelerate fetching?

My current parameters (using Nutch-1.2):
topN: 40,000
depth: 8
adddays: 30
fetcher.server.delay: 1
db.max.outlinks.per.page: 500

All parameters not mentioned have standard values as well as 
regex-urlfilter.txt.

Best Regards
Thomas


________________________________

GfK SE, Nuremberg, Germany, commercial register Nuremberg HRB 25014; Management 
Board: Professor Dr. Klaus L. W?bbenhorst (CEO), Pamela Knapp (CFO), Dr. 
Gerhard Hausruckinger, Petra Heinlein, Debra A. Pruent, Wilhelm R. Wessels; 
Chairman of the Supervisory Board: Dr. Arno Mahlert
This email and any attachments may contain confidential or privileged 
information. Please note that unauthorized copying, disclosure or distribution 
of the material in this email is not permitted.

Reply via email to