Re: Extract Text from page object?

Hartmann Toël Thu, 11 May 2017 04:49:11 -0700

(a) yes
(b) yes

very basic example code:
            StringWriter out = new StringWriter();
            PDDocument doc = PDDocument.load(file);
            nbPages = doc.getNumberOfPages();
            PDFTextStripper stripper = new PDFTextStripper();
            stripper.setStartPage(1);
            stripper.setEndPage(1);
            stripper.writeText(doc, out);
            txt = out.toString().trim();
            out.close();
            doc.close();


Please check the sample code included in pdfbox for better examples

Best regards
Toël Hartmann

On 11 maj 2017, at 12:47, David Patterson <patterd20...@gmail.com> wrote:

> Is is possible to
> (a) iterate over the PDF by page [I believe the answer is "Yes"]
> (b) extract the text from a page [Don't know]
> 
> This would allow some nice capabilities, but with an added complexity of
> words that split between pages.
> 
> Thanks for the info.
> 
> Dave Patterson


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: Extract Text from page object?

Reply via email to