On Monday 12 September 2011 18:08:50 Jukka Zitting wrote:
> > For pages without base href the wrong-link/ is resolved to
> > http://example.org/content/wrong-link/. The new page also contains the
> > same url list as above so the next wrong link is resolved as
> > http://example.org/content/wrong-link/wrong-link/......
> > 
> > An endless nightmare for a crawler :)
> 
> How would not resolving the links in Tika help in this case? To crawl
> the site, the crawler would in any case have to resolve the links, and
> come up with the exact same resolved URLs.
> 

I could choose not to collect those relative URL's as outlink. Right now i 
cannot determine whether a URL was originally a relative URL.

> BR,
> 
> Jukka Zitting

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to