Hi, I can extract the content of a monolingual PDF files using the following code: PDFTextStripper stripper = new PDFTextStripper(); PDDocument doc = PDDocument.load(file); stripper.setSortByPosition(true); String txt = stripper.getText(doc); doc.close();
It's perfect when the input document is monolingual. The problem is that when the input document is a combination of right-to-left and left-to-right languages, the output characters of one language is reversed! A sample bilingual pdf document is attached. Would you please help me in this issue? Thanks.
test.pdf
Description: application/text-plain

