Extracting text between two bookmarks using Apache PdfBox

Shriram Mon, 05 Mar 2012 23:41:00 -0800

I am using Apache PDFBox to read a PDF document which has a hierarchy, which is 
defined by the bookmarks. The hierarchy is in a tree form with contents only at 
the leaf level. When I try to extract the text between two leaf level 
bookmarks(using Stripper.setStartBookmark(), Stripper.setEndBookmark() and 
Stripper.writeText()), I get the text in the whole page instead. In short, my 
problem is similar to that mentioned 
in http://www.java-forums.org/advanced-java/51032-pdox-1-6-0-extract-text-between-2-bookmarks-same-page-sos.html
Is there a way to extract the contents between two bookmarks? If so, what 
should be the change in my code?

Extracting text between two bookmarks using Apache PdfBox

Reply via email to