Re: UTF16 encoded string to PDFDocEncoding

2017-07-11 Thread Tilman Hausherr

fixed in https://issues.apache.org/jira/browse/PDFBOX-3864

Tilman

Am 11.07.2017 um 16:06 schrieb Tilman Hausherr:
The cause are "gaps" in the PDFDocEncoding specification that have 
been missed in the implementation. I'll create an issue later.


Tilman

Am 10.07.2017 um 19:22 schrieb Andrea Vacondio:
Hi, we came across this case where we are basically cloning outline 
items

where the original outline title is a UTF16BE encoded text string
containing the value 00A0 (non break space). We later use the string to
assign the title in a new outline item and the A0 is recognised as a 
€ sign.

Here is a simple test:

 COSString victim = COSString
.parseHex("FEFF004300680061007000740065007200A0");
 PDOutlineItem node = new PDOutlineItem();
 node.setTitle(victim.getString());

If you look at the node dictionary you'll see that the title value is
Chapter€




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: UTF16 encoded string to PDFDocEncoding

2017-07-11 Thread Tilman Hausherr
The cause are "gaps" in the PDFDocEncoding specification that have been 
missed in the implementation. I'll create an issue later.


Tilman

Am 10.07.2017 um 19:22 schrieb Andrea Vacondio:

Hi, we came across this case where we are basically cloning outline items
where the original outline title is a UTF16BE encoded text string
containing the value 00A0 (non break space). We later use the string to
assign the title in a new outline item and the A0 is recognised as a € sign.
Here is a simple test:

 COSString victim = COSString
 .parseHex("FEFF004300680061007000740065007200A0");
 PDOutlineItem node = new PDOutlineItem();
 node.setTitle(victim.getString());

If you look at the node dictionary you'll see that the title value is
Chapter€




-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org



Re: UTF16 encoded string to PDFDocEncoding

2017-07-11 Thread Andrea Vacondio
I'm talking about the node dictionary, try adding this:
System.out.println(node.getTitle());

On Tue, Jul 11, 2017 at 12:20 PM, Andreas Lehmkühler 
wrote:

>
> > Andreas Lehmkühler  hat am 11. Juli 2017 um 12:17
> geschrieben:
> >
> >
> >
> > > Andrea Vacondio  hat am 10. Juli 2017 um
> 19:22 geschrieben:
> > >
> > >
> > > Hi, we came across this case where we are basically cloning outline
> items
> > > where the original outline title is a UTF16BE encoded text string
> > > containing the value 00A0 (non break space). We later use the string to
> > > assign the title in a new outline item and the A0 is recognised as a €
> sign.
> > > Here is a simple test:
> > >
> > > COSString victim = COSString
> > > .parseHex("FEFF004300680061007000740065007200A0");
> > > PDOutlineItem node = new PDOutlineItem();
> > > node.setTitle(victim.getString());
> > >
> > > If you look at the node dictionary you'll see that the title value is
> > > Chapter€
> > How do you look at the dictionary?
> >
> > The following code:
> > COSString victim = COSString.parseHex( 
> > "FEFF004300680061007000740065007200A0"
> );
> >   System.out.println( victim.toHexString() );
> >   System.out.println( victim.getString() );
> Ups, something is missing 
>
> The output looks good to me:
> FEFF004300680061007000740065007200A0
> Chapter
> Note the second line ends with a space
>
>
> Andreas
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>


Re: UTF16 encoded string to PDFDocEncoding

2017-07-11 Thread Andreas Lehmkühler

> Andreas Lehmkühler  hat am 11. Juli 2017 um 12:17 
> geschrieben:
> 
> 
> 
> > Andrea Vacondio  hat am 10. Juli 2017 um 19:22 
> > geschrieben:
> > 
> > 
> > Hi, we came across this case where we are basically cloning outline items
> > where the original outline title is a UTF16BE encoded text string
> > containing the value 00A0 (non break space). We later use the string to
> > assign the title in a new outline item and the A0 is recognised as a € sign.
> > Here is a simple test:
> > 
> > COSString victim = COSString
> > .parseHex("FEFF004300680061007000740065007200A0");
> > PDOutlineItem node = new PDOutlineItem();
> > node.setTitle(victim.getString());
> > 
> > If you look at the node dictionary you'll see that the title value is
> > Chapter€
> How do you look at the dictionary?
> 
> The following code:
> COSString victim = COSString.parseHex( "FEFF004300680061007000740065007200A0" 
> );
>   System.out.println( victim.toHexString() );
>   System.out.println( victim.getString() );
Ups, something is missing 

The output looks good to me:
FEFF004300680061007000740065007200A0
Chapter 
Note the second line ends with a space


Andreas

-
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org