Massimo Miccoli wrote:
The general problem is urls like: http://www.agriturismo.pg.it/storia-citta-umbria/index.html
a custom not found pages that generate infinite crawler loop on site.

You're referring to error pages that do not return 404?

In another thread I just suggested a way to handle these:

http://www.mail-archive.com/nutch-user%40incubator.apache.org/msg00286.html

The url you mention is amenable to this solution. It's title contains the string "pagina di errore", but it does not return a 404.

Doug


------------------------------------------------------- This SF.Net email is sponsored by: New Crystal Reports XI. Version 11 adds new functionality designed to reduce time involved in creating, integrating, and deploying reporting solutions. Free runtime info, new features, or free trial, at: http://www.businessobjects.com/devxi/728 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to