Do you mean bookmarks? If the book doesn't have proper bookmarks,then you can't get the bookmark information from the PDF dictionary. But from your description, you want to go thru all text and figure out which is the heading? Theoretically, its doable, from my point of view, you need to extend the steam engine then parse the text.
在 2012年3月11日 上午3:24,Juhan Voolaid <[email protected]>写道: > Hello > > I am totally new to the insides of PDF structure and also PDFBox library. > My goal is to extract index data from a pdf book (that book does not have > proper index, but it is very needed). That means I want to detect the > fonts/sizes of headings and sub-headings, then extract their text along > with page number and later just print that information to text file. > > How to do that? > > > Juhan > > > -- Isaac Tian

