I have one more question, does the url in the 'urls' file
need to match exactly to conf/crawl-urlfilter.txt?
The rules in crawl-urlfilter.txt have to be written to solve the stuff you would solve.

It is just that I want to start my search from an .asp
page (including a query string).
The rules are independent from your start page.

Would I be able to have just the server domain in conf/crawl-urlfilter.txt?
Yes.

> +^http://([a-z0-9]*\.)*nutch.org
^http:// will allow all pages starting with http:// from the domain nutch.org. All subdomains are included if they consist only in a-z and 0-9. Maybe you would should also add "-" which is a possible sign in Subdomains:
^http://([a-z0-9\-]*\.)*nutch.org


Bye

Matthias

--
http://www.eventax.com - eventax GmbH
http://www.umkreisfinder.de - Die Suchmaschine f�r Lokales und Events


------------------------------------------------------- This SF.Net email is sponsored by: InterSystems CACHE FREE OODBMS DOWNLOAD - A multidimensional database that combines robust object and relational technologies, making it a perfect match for Java, C++,COM, XML, ODBC and JDBC. www.intersystems.com/match8 _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to