HtmlParser should resolve relative paths in <a href="xxx"> elements
-------------------------------------------------------------------

                 Key: TIKA-287
                 URL: https://issues.apache.org/jira/browse/TIKA-287
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 0.4
            Reporter: Ken Krugler


Currently clients of the HtmlParser need to manually keep track of the 
appropriate base URL to use when resolving relative URLs in href="xxx" 
attributes.

The parser should use the metadata RESOURCE_NAME_KEY value as the base.

The parser should also watch for a <base> element in the <head> section, and 
use that to update the base URL.

Note that special care must be taken to work around a known bug in the Java 
URL() class, when the relative URL is a query string and the base URL doesn't 
end with a '/'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to