I am using what is in SVN (pdfbox 1.0.1 SNAPSHOT) (I have seen some new/updated code in
org.apache.pdfbox.pdmodel.documentinterchange.logicalstructure package and org.apache.pdfbox.pdmodel.documentinterchange.markedcontent package since 1.0.0 official release). I have created marked_content.pdf (using iText code). I have seen these tags fine from Acrobat. The structure is like this: (Top level) "Everything" (a Sect) It has three <p> children (I will call it p1 p2 and p3, actually there are all just "P"s). Each "P" has some text in it. P1: 1It was the best of times, it was the worst of times, 2it was the age of wisdom, it was the age of foolishness, 3it was the epoch of belief, it was the epoch of incredulity, 4it was the season of Light, it was the season of Darkness, 5it was the spring of hope, it was the winter of despair. P2: 1We had everything before us, we had nothing before us, 2we were all going direct to Heaven, we were all going direct 3the other way-in short, the period was so far like the present 4period, that some of its noisiest authorities insisted on its 5being received, for good or for evil, in the superlative degree 6of comparison only. P3: It was the best of times. How do we read the text back using PDFBox (SVN newest code)? I have been using PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog(); And PDStructureTreeRoot structureTreeRoot = docCatalog.getStructureTreeRoot(); I have looked at the content of structureTreeRoot, just cannot find these text content. Could anyone kindly tell me how to get to the text? Thanks, Tom Chenping Ni | Infogix, Inc. phone 630-505-5415 | fax 630-505-1812 [email protected] | www.infogix.com NOTICE: This e-mail message and any included attachments are from Infogix, Inc. ("Infogix") and are intended solely for use by the individual(s) to whom the message was addressed. The information contained herein may include privileged or otherwise confidential information. Unauthorized review, forwarding, printing, copying, distributing, or using the information contained in this message is strictly prohibited. If you have received this message in error, or have reason to believe that you are not authorized to receive it, please promptly notify the sender by e-mail, delete the message from your computer, and do not copy or disclose the information to anyone else. If you properly received this e-mail as an addressee, please maintain its contents in confidence to protect confidentiality. Thank you.

