Detect title and header or footer information in PDF based on page content?

Stefan Alder Sat, 16 Jul 2016 15:09:20 -0700

Does TIKA have the ability to extract title and header or footer
information based on an analysis of content on the page (as opposed to meta
data)?  For example, it could look for boldfaced, centered content, with
blank lines above and/or below to find a title. Or, in the case of headers,
say it was analyzing a letter, it could look for to/from address
information towards the top of the first page.


If TIKA doesn't go this far, are there any tools one would recommend to do
this?

Detect title and header or footer information in PDF based on page content?

Reply via email to