Extract headings from pdf

Juhan Voolaid Sun, 11 Mar 2012 04:09:17 -0700

Hello

I am totally new to the insides of PDF structure and also PDFBoxlibrary. My goal is to extract index data from a pdf book (that bookdoes not have proper index, but it is very needed). That means I want todetect the fonts/sizes of headings and sub-headings, then extract theirtext along with page number and later just print that information totext file.


How to do that?


Juhan

Extract headings from pdf

Reply via email to