Author: theli Date: 2006-03-09 14:04:57 +0100 (Thu, 09 Mar 2006) New Revision: 1870
Modified: trunk/source/de/anomic/data/robotsParser.java Log: *) more correct robots.txt validation - isDisallowed now uses getFile instead of getPath Modified: trunk/source/de/anomic/data/robotsParser.java =================================================================== --- trunk/source/de/anomic/data/robotsParser.java 2006-03-09 12:35:50 UTC (rev 1869) +++ trunk/source/de/anomic/data/robotsParser.java 2006-03-09 13:04:57 UTC (rev 1870) @@ -67,6 +67,11 @@ * It only parses the Deny Part, yet. * * * http://www.robotstxt.org/wc/norobots-rfc.html + * + * TODO: + * - On the request attempt resulted in temporary failure a robot + * should defer visits to the site until such time as the resource + * can be retrieved. */ public final class robotsParser{ @@ -263,7 +268,7 @@ } } - if (robotsTxt4Host.isDisallowed(nexturl.getPath())) { + if (robotsTxt4Host.isDisallowed(nexturl.getFile())) { return true; } return false; _______________________________________________ YaCy-svn mailing list [email protected] http://lists.berlios.de/mailman/listinfo/yacy-svn
