Hello, I'm having some difficulties using pdfbox. It does not behave how I expect and I don't know the problem. I'm tryng to build a pdf translation app using a translating engine. The idea is upload pdf, click button get pdf translated. The problem is that pdfbox messes up the characters. I tryed the ReplaceString.java application on a romanian newspaper pdf trying to replace a string. Pdfbox seems to mess up the diacritics. After replace the newly created PDF file shows as folows:
́„ instead of „ ́” instead of ” (the leading quote should not be there, romainian quotation is like „quoted text” ) ^fi instead of î (i circumflex) ~ and another character which did not display (displayed as an empty box) instead of ă (a grave i guess). If I replace string A with string B and string B contains diacritics, non of string B's diacritics will be displayed correctly. But same diacritics like ă, ș and ț from other parts of the document will be displayed correctly, mind the exceptions above. What can I do to get a correct PDF as output. My guess is that I have to supply the correct characters because the PDF standard, AFAIK, does not support romanian diacritics (which are ă â î ș ț ) caracter nume Unicode cod Unicode glyph Ă Latin capital letter A with breve 0102 Abreve ă Latin small letter A with breve 0103 abreve  Latin capital letter A with circumflex 00C2 Acircumflex â Latin small letter A with circumflex 00E2 acircumflex Î Latin capital letter I with circumflex 00CE Icircumflex î Latin small letter I with circumflex 00EE icircumflex Ș Latin capital letter S with comma below 0218 Scommaaccent ș Latin small letter S with comma below 0219 scommaaccent Ț Latin capital letter T with comma below 021A uni021A ț Latin small letter T with comma below 021B uni021B Windows operating systems (up to Windows XP, including) have a default, wrong mapping for Romanian characters, which is: caracter nume Unicode cod Unicode glyph Ş Latin capital letter S with cedilla 015E Scedilla ş Latin small letter S with cedilla 015F scedilla Ţ Latin capital letter T with cedilla 0162 uni0162 ţ Latin small letter T with cedilla 0163 uni0163 I hope this can be done easily and documented. Thanks, and happy holly days! -- -stan ioan-eugen

