Does TIKA have the ability to extract title and header or footer
information based on an analysis of content on the page (as opposed to meta
data)?  For example, it could look for boldfaced, centered content, with
blank lines above and/or below to find a title. Or, in the case of headers,
say it was analyzing a letter, it could look for to/from address
information towards the top of the first page.

If TIKA doesn't go this far, are there any tools one would recommend to do
this?

Reply via email to