Problem with mixed RTL/LTR pdfs

Amir H. Jadidinejad Sat, 02 Aug 2014 13:46:28 -0700

Hi,
I can extract the content of a monolingual PDF files using the following code:
        PDFTextStripper stripper = new PDFTextStripper();
        PDDocument doc = PDDocument.load(file);
        stripper.setSortByPosition(true);
        String txt = stripper.getText(doc);
        doc.close();



It's perfect when the input document is monolingual.

The problem is that when the input document is a combination of right-to-left 
and left-to-right languages, the output characters of one language is reversed!

A sample bilingual pdf document is attached.

Would you please help me in this issue?

Thanks.

test.pdf
Description: application/text-plain

Problem with mixed RTL/LTR pdfs

Reply via email to