What parser are you using? The Tika parser will resolve relatives URL.s You 
can use the parserchecker to debug and test.

> Nutch doesn't seem to be collecting anchor tags similar to:
> 
> <a href="somePath">Title</a>
> 
> when there is a hostname included like below, Nutch crawls it just fine:
> 
> <a href="http://myHostname/subpath/somePath";>Title</a>
> 
> 
> The last regex in my regex-urlfilter.txt is:
> 
> +.
> 
> which should match anything, but doesn't seem to get these relative URL's
> 
> Any help would be greatly appreciated.
> 
> thanks

Reply via email to