I'm going to extract the content of a PDF file using PDFBox library. The 
content should be processed paragraph-by-paragraph and for each paragraph, I 
need its position for follow-up processing. Using the following code, I can 
extract the whole content of an input PDF:

PDDocument doc = PDDocument.load(file);
PDFTextStripper stripper = new PDFTextStripper();
String txt = stripper.getText(doc);
doc.close();

I have two problems:

    1. I don't know how to extract the content paragraph by paragraph.
    2. I don't know how to store the position of a paragraph for follow-up 
processing (for example highlighting and etc.)

Thanks.

Reply via email to