Author: theli
Date: 2006-03-09 14:04:57 +0100 (Thu, 09 Mar 2006)
New Revision: 1870

Modified:
   trunk/source/de/anomic/data/robotsParser.java
Log:
*) more correct robots.txt validation
   - isDisallowed now uses getFile instead of getPath

Modified: trunk/source/de/anomic/data/robotsParser.java
===================================================================
--- trunk/source/de/anomic/data/robotsParser.java       2006-03-09 12:35:50 UTC 
(rev 1869)
+++ trunk/source/de/anomic/data/robotsParser.java       2006-03-09 13:04:57 UTC 
(rev 1870)
@@ -67,6 +67,11 @@
  * It only parses the Deny Part, yet.
  *  *
  * http://www.robotstxt.org/wc/norobots-rfc.html
+ * 
+ * TODO:
+ *      - On the request attempt resulted in temporary failure a robot
+ *      should defer visits to the site until such time as the resource
+ *      can be retrieved.
  */
 public final class robotsParser{
     
@@ -263,7 +268,7 @@
             }
         }
         
-        if (robotsTxt4Host.isDisallowed(nexturl.getPath())) {
+        if (robotsTxt4Host.isDisallowed(nexturl.getFile())) {
             return true;        
         }        
         return false;

_______________________________________________
YaCy-svn mailing list
[email protected]
http://lists.berlios.de/mailman/listinfo/yacy-svn

Antwort per Email an