Hi, On Mon, Sep 12, 2011 at 4:58 PM, Markus Jelsma <[email protected]> wrote: > Since TIKA-287 all relative URL's are resolved to absolutes regardless of the > presence of the base element. This is not always desired behaviour.
Can you describe a use case where that's not the desired behaviour? I would assume that a resolved URL is always preferred to an unresolved one. > Would it be possible to use some setting to instruct the parser not to resolve > URL's if the base element doesn't exist or does not have an href attribute > with a valid absolute URL? Currently Tika looks at the CONTENT_LOCATION and RESOURCE_NAME_KEY metadata keys for the default base URL. If neither is present and there is no <base href=".."> element, then URLs in the document will not be resolved. BR, Jukka Zitting
