Hello,
I have found a solution to handle specials characters encoded in UTF-16,
re-build the string with characters codes from encoding:
char[] cs = nltds.toCharArray();
StringBuilder sencoded = new StringBuilder();
Encoding e =
EncodingManager.INSTANCE.getEncoding(COSName.WIN_ANSI_ENCODING);
for (int i = 0; i < cs.length; i++) {
Character c = cs[i];
sencoded.appendCodePoint(e.getCode(e.getNameFromCharacter(c)));
}
contentStream.drawString( sencoded.toString() );
Other things to do, is to add the glythlist.txt to your project and setting
the same encoding for the font (mine is trutype).
As a side effect, some characters names from the glyphlist.txt file are not
described into the WinAnsiEncoding class, I personnaly add nonbreakingspace
as a space (i am not sure it's a good thing, and i don't know how the
encoding is calculated, i whish i could map it to the corresponding
character, but i don't know how to calculate it)
addCharacterEncoding( 040, "space" );;
/// make a nonbreakingspace a space
addCharacterEncoding( 040, "nonbreakingspace" );;
Anyway, the builded string is correctly rendered at end into the pdf
generated.
The character should be (i think):
U+00A0 NO-BREAK SPACE
Propriétés générales du caractère
Présent dans Unicode depuis : 1.1
Catégorie Unicode : Séparateur, espace
Diverses représentations utiles
UTF-8 : 0xC2 0xA0
UTF-16 : 0x00A0
UTF-8 en C octal échappé : \302\240
Entité décimale XML :  
I hope this is original and that will be usefull because i spend a lot of
time searching a solution.
I also request help from PDFBox community to tell me if it is a good
solution, and how the character encoding is calculated.
This old post suggest that utf-16 is not handle for string objects :
http://forums.adobe.com/thread/285502
Is that true?
Regards,
Kévin
2012/3/21 Kévin Sailly <[email protected]>
> Hello,
>
> Using PDFBox looks better for me as the code allow direct interaction with
> PDF building, as I can directly insert PDF commands. So I am sure that I
> can have full control on generated PDF.
>
> But I will have a look to FOP to see if it can feed my needs.
>
> Thanks,
> Kévin
>
>
>
> 2012/3/20 mehdi houshmand <[email protected]>
>
>> Hi Martin,
>>
>> I'm not sure I'd necessarily agree with you there, but I'm sure you've
>> done
>> your homework. Sorry I can't help with the PDFBox issue.
>>
>> Mehdi
>>
>> On 20 March 2012 15:23, Martin Hentschel <[email protected]> wrote:
>>
>> > The reason for choosing PDFBox instead of Apache FOP is that it (a) is
>> > easier to generate simple PDFs, and (b) uses less disk space.
>> >
>> > Martin
>> >
>> > On 20.03.2012, at 08:59, mehdi houshmand wrote:
>> >
>> > > Hi Guys,
>> > >
>> > > Why is it you're using PDFBox to create these PDFs? Have you
>> considered
>> > > using Apache FOP (http://xmlgraphics.apache.org/fop/)? Is there a
>> > specific
>> > > reason you're using PDFBox specifically?
>> > >
>> > > Mehdi
>> > >
>> > > On 20 March 2012 06:49, Kévin Sailly <[email protected]> wrote:
>> > >
>> > >> Hello,
>> > >>
>> > >> The same for me.
>> > >>
>> > >>
>> >
>> https://issues.apache.org/jira/browse/PDFBOX-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel#issue-tabs
>> > >>
>> > >> Kévin
>> > >>
>> > >> 2012/3/20 Martin Hentschel <[email protected]>
>> > >>
>> > >>> Hi,
>> > >>>
>> > >>> I want to follow up on this message "€ in PDF":
>> > >>> http://markmail.org/message/h4skurvieiyk7izq
>> > >>>
>> > >>> I spent hours trying to figure out how to create a PDF document
>> > >> containing
>> > >>> the Euro symbol. The output is always garbled. Same happens for
>> > endash
>> > >>> and emdash symbols.
>> > >>>
>> > >>> Attached you find a text file and an output PDF created by the
>> > TextToPDF
>> > >>> utility.
>> > >>>
>> > >>> Thanks for your help,
>> > >>>
>> > >>> Martin
>> > >>>
>> > >>> (I also tried using TTF fonts, but didn't help.)
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>
>> >
>> >
>>
>
>