Hello. I was able to fix the problem by applying the patch described in PDF-BOX-283.
Also, I was not able to notice any unwanted side effects. I added my vote to have this patch applied into trunk. https://issues.apache.org/jira/browse/PDFBOX-283 Pasi Koski From: Koski Pasi Sent: 11. huhtikuuta 2014 11:09 To: '[email protected]' Subject: Non-Ascii characters messed up in AcroForm (PdbBox 1.8.4) Hi. I'm working on a Java server side application which produces PDF forms which are pre-filled by the application. These documents are delivered to the end user via a browser interface after which the end user continues to edit the forms. Usually the forms are then printed by the end user or just saved electronically. No additional processing of the user input by the application is needed, although this may be a future scenario. The problem is with displaying non-ascii characters in editable fields. When the data entered by the application in a form field contains non-ascii characters, they do not show up correctly once the document is opened in a PDF viewer. However, when the field is selected, the content is displayed correctly. If the data is changed, it will continue to display correctly after selecting another field, but if left unchanged, non-ascii characters return to the messed up state when the user moves out of the field. I'm using PDFBox 1.8.4, but I had the same problem with the previous version (1.8.3). I have not tried earlier versions. Can anyone tell me if non-ascii characters are supposed to work properly in an AcroForm field? What requirements does this pose on the PDF template? Do I need to encode the data before setting as the value of the PDField? If so, what encoding method to use? Below is a simplified code sample of what I'm doing, from end-to-end. I've tried various alternatives in setting the encoding of the value of the field and I've made attempts to control the font setting via the DA dictionary parameter, but with no success. In most cases the read-only value turned out invisible, while selecting the field would display the data correctly. //MyPdfCreator: String TEMPLATE_NAME = "Form_13349A.pdf"; InputStream is = this.getClass().getClassLoader().getResourceAsStream(TEMPLATE_NAME); pdfTemplate = PDDocument.load(is); PDDocumentCatalog docCatalog = pdfTemplate.getDocumentCatalog(); PDAcroForm acroForm = docCatalog.getAcroForm(); PDField field = acroForm.getField("Field1"); String valueWithNonAsciiChars = "ÄÅÖöäå"; field.setValue(valueWithNonAsciiChars); ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); pdfTemplate.save(byteArrayOutputStream); pdfTemplate.close(); byte[] pdf = byteArrayOutputStream.toByteArray(); //MyHttpRequestHandler: ByteArrayOutputStream baos = new ByteArrayOutputStream(pdf.length); baos.write(pdf, 0, pdf.length); resourceResponse.setContentType("application/pdf"); resourceResponse.addProperty(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=Form_13349A.pdf"); resourceResponse.setContentLength(baos.size()); OutputStream out = resourceResponse.getPortletOutputStream(); baos.writeTo(out); out.flush(); out.close(); Every hint I've found on the Internet suggest that it's a font related problem. But frankly, it seems like PdfBox is messing up the textField properties while setting the value. I found a couple of descriptions matching my problem, but no solution. PDFBOX-283 issue seems to be talking about the same problem, and there is even a patch attached, but apparently the fix has other unwanted side effects or why was it not added to the latest version? I have not tested the patch yet, but I probably will shortly. https://issues.apache.org/jira/browse/PDFBOX-283 As a temporary fix, I was able to produce a successful result by editing the template PDF, by setting the Custom Format Script (that's what Adobe XI calls it) of the field like so: var txtField = event.target; txtField.textFont = font.Helv; txtField.textColor = color.black; HOWEVER, this only works with Adobe Reader, not the built-in reader with Chrome or Firefox. Plus, this is not a very nice fix since it requires the PDF template designer to remember to copy the script into the Custom Format Script entry for each and every field in each and every PDF template. Most importantly though, the solution should support every major PDF viewer. Help would be very much appreciated! Pasi Koski

