Hi,

try:

In my conf/crawl-urlfilter.txt I have tried:
+^http://([a-z0-9]*\.)*nutch.org/
+^http://([a-z0-9]*\.)*nutch.org

+^http://*.nutch.org/
This would never work.

Stars does not mean every sign. They are multipliers for the signs infront of the star.
Dots mean every sign.
\. means dots
Please google for "regex" or "perl regular expressions".



my urls file contains:
http://www.nutch.org
If you ask nutch to check against a string with slash at the end your url should have this also.
Try: http://www.nutch.org/


Bye

Matthias


------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to