Dear Mark, Thanks for your reply. Unfortunately, I don't understand the relation between your post and the question! I'm newbie in PDFBox, would you please elaborate how to extract the position of a specific paragraph using the attached code? It seems that it works with "fields" in the input pdf file. I'm looking for paragraphs, what's their relation? Kind regards, Amir
________________________________ From: "Strein, Mark C CIV USARMY TRADOC ANALYSIS CTR (US)" <[email protected]> To: "[email protected]" <[email protected]>; Amir H. Jadidinejad <[email protected]> Sent: Monday, August 4, 2014 3:52 PM Subject: RE: How to find the position of a specific paragraph in the input PDF? (UNCLASSIFIED) Classification: UNCLASSIFIED Caveats: NONE Morning Sir, The basic construct for extracting the value in a field is: field.getFullyQualifiedName().equalsIgnoreCase(fullyQualifiedName).getValue( ) - note: I use fully qualified names(FQN) to prevent errors My way of extracting the FQN is as follows(the short version): private void processField(PDField field,boolean buildPDList) throws IOException { List kids = field.getKids(); if(kids != null) { Iterator kidsIter = kids.iterator(); while(kidsIter.hasNext()) { Object pdfObj = kidsIter.next(); if(pdfObj instanceof PDField) { PDField kid = (PDField)pdfObj; processField(kid,buildPDList); } } } else { If(!buildPDlist) { System.err.println(field.getFullyQualifiedName()); } else { //other processing } } } Hope that helps. V/R, Mark Strein -----Original Message----- From: Amir H. Jadidinejad [mailto:[email protected]] Sent: Sunday, August 03, 2014 8:53 PM To: user pdfbox Subject: How to find the position of a specific paragraph in the input PDF? I'm going to extract the content of a PDF file using PDFBox library. The content should be processed paragraph-by-paragraph and for each paragraph, I need its position for follow-up processing. Using the following code, I can extract the whole content of an input PDF: PDDocument doc = PDDocument.load(file); PDFTextStripper stripper = new PDFTextStripper(); String txt = stripper.getText(doc); doc.close(); I have two problems: 1. I don't know how to extract the content paragraph by paragraph. 2. I don't know how to store the position of a paragraph for follow-up processing (for example highlighting and etc.) Thanks. Classification: UNCLASSIFIED Caveats: NONE

