Yes, it helps. Thank you for the prompt answer! I wonder why the string returned by getUnicode contains the separate chars instead of the ligature. Is there some way I can configure PDFTextStripper to decode it as it is in the PDF?
On Tue, Jul 19, 2016 at 4:47 PM Tilman Hausherr <[email protected]> wrote: > Am 19.07.2016 um 20:43 schrieb Ygor Mutti: > > Hi! > > > > The javadoc states that the TextPosition.getIndividualWidths() method > > returns "An array that is the same length as the length of the string." > > Here is a gist containing a test case in which this statement is false: > > https://gist.github.com/ygormutti/d40a80d425d552151625a063fb29c9ca > > I'd say the javadoc is wrong. It is the length of the CharacterCodes > array, not the length of the unicode string. The "fi" in Justificação is > one glyph, a ligature. > > This is the content stream: > > [ (J) 20 (usti\037ca\347\343o) ] TJ > > Does this explanation help? > > Tilman > > > > > It prints a line for two cases where the TextPosition.getUnicode() > returns > > "fi" while at the same time TextPosition,getIndividualWidths() returns an > > array containing a single float. > > > > I've tried to pin down the version in which this behavior has been > > introduced and found out it works as expected in 1.2.1 release and does > not > > since 1.3.0. > > > > Should I open a ticket for this? > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >

