Re: Extract headings from pdf

Isaac@Gmail Sun, 11 Mar 2012 12:01:21 -0700

Do you mean bookmarks? If the book doesn't have proper bookmarks,then you
can't get the bookmark information from the    PDF dictionary. But from
your description, you want to go thru all text and figure out which is the
heading? Theoretically, its doable, from my point of view, you need to
extend the steam engine then parse the text.



在 2012年3月11日 上午3:24，Juhan Voolaid <[email protected]>写道：

> Hello
>
> I am totally new to the insides of PDF structure and also PDFBox library.
> My goal is to extract index data from a pdf book (that book does not have
> proper index, but it is very needed). That means I want to detect the
> fonts/sizes of headings and sub-headings, then extract their text along
> with page number and later just print that information to text file.
>
> How to do that?
>
>
> Juhan
>
>
>


-- 
Isaac Tian

Re: Extract headings from pdf

Reply via email to