The general problem is urls like: http://www.agriturismo.pg.it/storia-citta-umbria/index.html
a custom not found pages that generate infinite crawler loop on site. It's not a rare casse, whe one (like me) try to fetch
whole web.
BTW if you wanto you can test a Nutch search on 50.000.000 pages (not urls) at http://crawlers.iltrovatore.it:8088/search.jsp


massimo



Doug Cutting wrote:

Massimo Miccoli wrote:

In any way the circular links is a big problem for Nutch. Not only for analyze tool, but also for fetcher speed and Wedb size. Any solution?


What is the general problem with Nutch's handling of circular links? Nearly every site has them. I am able to crawl, index and search sites with circular links.

Doug


-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers




-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to