Hi,
I can extract the content of a monolingual PDF files using the following code:
        PDFTextStripper stripper = new PDFTextStripper();
        PDDocument doc = PDDocument.load(file);
        stripper.setSortByPosition(true);
        String txt = stripper.getText(doc);
        doc.close();


It's perfect when the input document is monolingual.

The problem is that when the input document is a combination of right-to-left 
and left-to-right languages, the output characters of one language is reversed!

A sample bilingual pdf document is attached.

Would you please help me in this issue?

Thanks.

Attachment: test.pdf
Description: application/text-plain

Reply via email to