Nutch doesn't seem to be collecting anchor tags similar to:

<a href="somePath">Title</a>

when there is a hostname included like below, Nutch crawls it just fine:

<a href="http://myHostname/subpath/somePath";>Title</a>


The last regex in my regex-urlfilter.txt is:

+.

which should match anything, but doesn't seem to get these relative URL's

Any help would be greatly appreciated.

thanks

Reply via email to