Hi PdfBox team,
is the following maybe an issue in PdfBox?
Steps to reproduce
===============
1. Create a document that contains a radio button with Umlaut in name. I can
give you an example document.
Let's say: A radio group "Geschlecht" with the buttons "männlich" and
"weiblich". Do not use PdfBox for this step. I used Acrobat Pro 2020.
The name/value of the "männlich" button is encoded as "/m#e4nnlich" in the PDF.
2. Update the value of the radio group with PdfBox to "männlich" and save it to
a new document.
import java.io.File;
import org.apache.pdfbox.Loader;
import org.apache.pdfbox.pdmodel.PDDocument;
public class UpdateRadioGroup {
private static final String INPUT_FILE = "form_empty.pdf";
private static final String OUTPUT_FILE = "form_selected.pdf";
private static final String FIELD_NAME = "Geschlecht";
private static final String FIELD_VALUE = "männlich";
public static void main(String[] args)
throws Exception {
try (PDDocument document = Loader.loadPDF(new File(INPUT_FILE))) {
document.getDocumentCatalog()
.getAcroForm(null)
.getField(FIELD_NAME)
.setValue(FIELD_VALUE);
document.save(new File(OUTPUT_FILE));
}
}
}
3. Validate the name/value of the "männlich" button in the new document in a
text editor. PdfBox encodes "männlich" to "/m#c3#a4nnlich" (see
COSName.writePDF() ).
The problem
===============
PdfBox renames the radio button from "männlich" to "männlich". Or
"/m#e4nnlich" to "/m#c3#a4nnlich" in PDF-format.
When you read the document again, PdfBox converts "#c3#a" to "ä" but all other
programs do not. I tested Acrobat Pro 2020, actual Acrobat Reader, PDFXplorer
from https://www.o2sol.com
Thanks
Markus
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]