Re: UTF16 encoded string to PDFDocEncoding
fixed in https://issues.apache.org/jira/browse/PDFBOX-3864 Tilman Am 11.07.2017 um 16:06 schrieb Tilman Hausherr: The cause are "gaps" in the PDFDocEncoding specification that have been missed in the implementation. I'll create an issue later. Tilman Am 10.07.2017 um 19:22 schrieb Andrea Vacondio: Hi, we came across this case where we are basically cloning outline items where the original outline title is a UTF16BE encoded text string containing the value 00A0 (non break space). We later use the string to assign the title in a new outline item and the A0 is recognised as a € sign. Here is a simple test: COSString victim = COSString .parseHex("FEFF004300680061007000740065007200A0"); PDOutlineItem node = new PDOutlineItem(); node.setTitle(victim.getString()); If you look at the node dictionary you'll see that the title value is Chapter€ - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: UTF16 encoded string to PDFDocEncoding
The cause are "gaps" in the PDFDocEncoding specification that have been missed in the implementation. I'll create an issue later. Tilman Am 10.07.2017 um 19:22 schrieb Andrea Vacondio: Hi, we came across this case where we are basically cloning outline items where the original outline title is a UTF16BE encoded text string containing the value 00A0 (non break space). We later use the string to assign the title in a new outline item and the A0 is recognised as a € sign. Here is a simple test: COSString victim = COSString .parseHex("FEFF004300680061007000740065007200A0"); PDOutlineItem node = new PDOutlineItem(); node.setTitle(victim.getString()); If you look at the node dictionary you'll see that the title value is Chapter€ - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org
Re: UTF16 encoded string to PDFDocEncoding
I'm talking about the node dictionary, try adding this: System.out.println(node.getTitle()); On Tue, Jul 11, 2017 at 12:20 PM, Andreas Lehmkühlerwrote: > > > Andreas Lehmkühler hat am 11. Juli 2017 um 12:17 > geschrieben: > > > > > > > > > Andrea Vacondio hat am 10. Juli 2017 um > 19:22 geschrieben: > > > > > > > > > Hi, we came across this case where we are basically cloning outline > items > > > where the original outline title is a UTF16BE encoded text string > > > containing the value 00A0 (non break space). We later use the string to > > > assign the title in a new outline item and the A0 is recognised as a € > sign. > > > Here is a simple test: > > > > > > COSString victim = COSString > > > .parseHex("FEFF004300680061007000740065007200A0"); > > > PDOutlineItem node = new PDOutlineItem(); > > > node.setTitle(victim.getString()); > > > > > > If you look at the node dictionary you'll see that the title value is > > > Chapter€ > > How do you look at the dictionary? > > > > The following code: > > COSString victim = COSString.parseHex( > > "FEFF004300680061007000740065007200A0" > ); > > System.out.println( victim.toHexString() ); > > System.out.println( victim.getString() ); > Ups, something is missing > > The output looks good to me: > FEFF004300680061007000740065007200A0 > Chapter > Note the second line ends with a space > > > Andreas > > - > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >
Re: UTF16 encoded string to PDFDocEncoding
> Andreas Lehmkühlerhat am 11. Juli 2017 um 12:17 > geschrieben: > > > > > Andrea Vacondio hat am 10. Juli 2017 um 19:22 > > geschrieben: > > > > > > Hi, we came across this case where we are basically cloning outline items > > where the original outline title is a UTF16BE encoded text string > > containing the value 00A0 (non break space). We later use the string to > > assign the title in a new outline item and the A0 is recognised as a € sign. > > Here is a simple test: > > > > COSString victim = COSString > > .parseHex("FEFF004300680061007000740065007200A0"); > > PDOutlineItem node = new PDOutlineItem(); > > node.setTitle(victim.getString()); > > > > If you look at the node dictionary you'll see that the title value is > > Chapter€ > How do you look at the dictionary? > > The following code: > COSString victim = COSString.parseHex( "FEFF004300680061007000740065007200A0" > ); > System.out.println( victim.toHexString() ); > System.out.println( victim.getString() ); Ups, something is missing The output looks good to me: FEFF004300680061007000740065007200A0 Chapter Note the second line ends with a space Andreas - To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org