Hi list,

I'm running a crawl over a site, but it seems to be fetching pages
outside of the regex domain.

+^http://([a-z0-9]*\.)*curtin.edu.au/

ie.

fetching http://www.environment.sa.gov.au/epa/used_packaging.html
fetching http://abc.net.au/triplej/hottest100/ringtones/default.htm
fetching http://dmoz.org/News/Newspapers/

This seems wrong to me, is there some way make sure I haven't made any
stupid mistakes?

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to