HtmlParser should resolve relative paths in <a href="xxx"> elements -------------------------------------------------------------------
Key: TIKA-287 URL: https://issues.apache.org/jira/browse/TIKA-287 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 0.4 Reporter: Ken Krugler Currently clients of the HtmlParser need to manually keep track of the appropriate base URL to use when resolving relative URLs in href="xxx" attributes. The parser should use the metadata RESOURCE_NAME_KEY value as the base. The parser should also watch for a <base> element in the <head> section, and use that to update the base URL. Note that special care must be taken to work around a known bug in the Java URL() class, when the relative URL is a query string and the base URL doesn't end with a '/'. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.