hi there, 1) I did several testing running to fetch page from two website. The fetching depth is 10.
After checking log files, I found the actual fetched page linkage is very different for two sites. In one site with lots of news, only first two depth fetching running well and only fetching 5 linkages. The actual linkages in that site is far beyond that. The other site can fetch till 10 rounds and fetched 100's linkage. I wonder if any one has similar experience. Should I setup configure files in /conf/? 2) Also, in Nutch/conf/ directory, I found several configuration files. Actually, I only modify crawl-urlfilter.txt to let it accept all the url (*.*). Is it proper? I really doesn't touch other conf files. Is there a guideline how I use these files? thanks, Michael, __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
