The rules in crawl-urlfilter.txt have to be written to solve the stuff you would solve.I have one more question, does the url in the 'urls' file need to match exactly to conf/crawl-urlfilter.txt?
It is just that I want to start my search from an .asp page (including a query string).
The rules are independent from your start page.
Would I be able to have just the server domain in conf/crawl-urlfilter.txt?
Yes.
> +^http://([a-z0-9]*\.)*nutch.org
^http:// will allow all pages starting with http:// from the domain nutch.org. All subdomains are included if they consist only in a-z and 0-9. Maybe you would should also add "-" which is a possible sign in Subdomains:
^http://([a-z0-9\-]*\.)*nutch.org
Bye
Matthias
-- http://www.eventax.com - eventax GmbH http://www.umkreisfinder.de - Die Suchmaschine f�r Lokales und Events
------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
