Yes, it helps. Thank you for the prompt answer!

I wonder why the string returned by getUnicode contains the separate chars
instead of the ligature. Is there some way I can configure PDFTextStripper
to decode it as it is in the PDF?

On Tue, Jul 19, 2016 at 4:47 PM Tilman Hausherr <[email protected]>
wrote:

> Am 19.07.2016 um 20:43 schrieb Ygor Mutti:
> > Hi!
> >
> > The javadoc states that the TextPosition.getIndividualWidths() method
> > returns "An array that is the same length as the length of the string."
> > Here is a gist containing a test case in which this statement is false:
> > https://gist.github.com/ygormutti/d40a80d425d552151625a063fb29c9ca
>
> I'd say the javadoc is wrong. It is the length of the CharacterCodes
> array, not the length of the unicode string. The "fi" in Justificação is
> one glyph, a ligature.
>
> This is the content stream:
>
> [ (J) 20 (usti\037ca\347\343o) ] TJ
>
> Does this explanation help?
>
> Tilman
>
> >
> > It prints a line for two cases where the TextPosition.getUnicode()
> returns
> > "fi" while at the same time TextPosition,getIndividualWidths() returns an
> > array containing a single float.
> >
> > I've tried to pin down the version in which this behavior has been
> > introduced and found out it works as expected in 1.2.1 release and does
> not
> > since 1.3.0.
> >
> > Should I open a ticket for this?
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to