I'm going to extract the content of a PDF file using PDFBox library. The content should be processed paragraph-by-paragraph and for each paragraph, I need its position for follow-up processing. Using the following code, I can extract the whole content of an input PDF:
PDDocument doc = PDDocument.load(file); PDFTextStripper stripper = new PDFTextStripper(); String txt = stripper.getText(doc); doc.close(); I have two problems: 1. I don't know how to extract the content paragraph by paragraph. 2. I don't know how to store the position of a paragraph for follow-up processing (for example highlighting and etc.) Thanks.

