RE: encoding

Jan Agermose // Conviator ApS Sun, 09 Feb 2014 13:08:19 -0800

ehmm, ok... so the last post gave me an idea. I wanted to open the PDF in 
reader, fill ind æøåÆØÅ in one of the fields, save the PDF and read it in using 
PDFBOX and write out the value as a byte array/values to see what its stored in 
it. I gave me some unexpected results - it gave me exacly the same values as if 
I hardcoded a string in java with æøåÆØÅ and converted it to a bytearray and 
printed out the values. Then I tried to insert new values in that one exact 
field - values having ÆØÅ in the value.


this works. In that one field. AND the font is different. 

Im thinking that the real problem is in the initial creating of the PDF. Its in 
openoffice and open office is then used to export to PDF and then the PDF is 
used in my code. 

Im guessing that we should look at how the PDF is created in the first place. 
My coworker is not danish. Maybe his openoffice is setting some font that just 
does not make sense to danish and so if he used my openoffice or ... fixed his, 
then I would not have a problem. 

so thats where we will look. Or I can simply open the PDF in my reader - fill 
in æ in all fields, save it and write over everything in java :D 

if it was a bigger job maybe it would make sense to really understand whats 
going on ... 



-----Original Message-----
From: Andreas Lehmkuehler [mailto:[email protected]] 
Sent: 9. februar 2014 19:44
To: [email protected]
Subject: Re: encoding

Hi,

Am 08.02.2014 17:31, schrieb Jan Agermose // Conviator ApS:
> hi
>
> Im trying to use this code to fill a document. It works - except for 
> encoding because of Danish chars: æøå
>
>              PDDocument pdfDocument = PDDocument.load(path);
>              PDType1Font font = PDType1Font.HELVETICA;
>              //contentStream.setFont(font, 12);
>
>              PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>              PDAcroForm acroForm = docCatalog.getAcroForm();
>
>              List<PDField> fields = acroForm.getFields();
>              for (PDField field : fields) {
>                  if (field.getFullyQualifiedName().equals("Text1")) {
>                      field.setValue(p.getFornavn() + " " + p.getEfternavn());
>              }
>              File f = File.createTempFile("ansoegningsyddanmark",".pdf");
>              pdfDocument.save(f);
>
>
> im also trying to change this :
>                      field.setValue(p.getFornavn() + " " + 
> p.getEfternavn()); into one of:
>                      field.setValue(p.getFornavn() + " " + p.getEfternavn()+ 
> "\0153u");
>                      field.setValue(new 
> String(p.getBy().getBytes("UTF-16"), "ISO8859_1") in order to try to fix it 
> but its not working.
>
> any ideas how to fix this?
Other encodings as WinANSI aren't yet supported, see PDFBOX-922 [1] for further 
details.

BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-922

RE: encoding

Reply via email to