Hi,

On Mon, Sep 12, 2011 at 4:58 PM, Markus Jelsma
<[email protected]> wrote:
> Since TIKA-287 all relative URL's are resolved to absolutes regardless of the
> presence of the base element. This is not always desired behaviour.

Can you describe a use case where that's not the desired behaviour? I
would assume that a resolved URL is always preferred to an unresolved
one.

> Would it be possible to use some setting to instruct the parser not to resolve
> URL's if the base element doesn't exist or does not have an href attribute
> with a valid absolute URL?

Currently Tika looks at the CONTENT_LOCATION and RESOURCE_NAME_KEY
metadata keys for the default base URL. If neither is present and
there is no <base href=".."> element, then URLs in the document will
not be resolved.

BR,

Jukka Zitting

Reply via email to