Re: [poppler] Incompatible number of glyphs from glib get_text{, layout}

Peter Waller Tue, 26 May 2015 09:45:23 -0700

I learned a bit more about PDFs today :)

I believe I've found the offending TJ:

/C0_0 1 Tf
15.9927 0 9.0157 13.2093 304.8821 331.25 Tm
[<00170016001000000037>55<000e>74<00000033>9<0057>4<00410052>-24<0054005a00560049004c004c004500000032>-4<0044>20<000e>]TJ

Font:

...

/Font <<
/C0_0 18 0 R

...

%% Original object ID: 123 0
18 0 obj
<<
  /BaseFont /CDGGAZ+Myriad-Roman
  /DescendantFonts 66 0 R
  /Encoding /Identity-H
  /Subtype /Type0
  /Type /Font
>>
endobj

Notably, it's missing a /ToUnicode, which all of the other fonts have.
I inspected the font object which has `/Subtype CIDFontType0C`, which
I extracted using pdftosrc. Unfortunately, file does not recognize the
format and I'm struggling to find anything able to read it. Hints
appreciated.

So, is there a poppler bug here? It seems that the glib API is having
Identity-H encoded characters (including nulls) emitted via the
poppler_page_get_text API, which is messing up the C-string length. So
should the API instead drop those charactars for which there isn't a
unicode mapping?

Thanks in advance?

On 26 May 2015 at 12:56, Peter Waller <[email protected]> wrote:
> I forgot to note that I transformed unprintable characters to "X" in
> my dumped representation.
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] Incompatible number of glyphs from glib get_text{, layout}

Reply via email to