Hi, I have a problem with extracting plain text from PDF documents that contain polish characters. I am using the following approach to extract text: ...... File f = new File(fileName);
PDFParser parser = new PDFParser(new FileInputStream(f)); parser.parse(); COSDocument cosDoc = parser.getDocument(); PDFTextStripper pdfStripper = new PDFTextStripper(); PDDocument pdDoc = new PDDocument(cosDoc); String parsedText = pdfStripper.getText(pdDoc); ...... parsedText is then written to a file using UTF8 encoding. The above code works fine in most cases. Text containing polish characters is extracted correctly. However, I managed to find a strange .pdf file for witch the above method does not work. Polish characters are replaced. E.g. polish crossed l (ł) is replaced by %. Is there any way to fix this problem? Regards, Piotr Rychlik

