Albert Astals Cid wrote: > Hi Ed, i'm getting lots of "Could not parse charref for nameToUnicode" after > applying your latest patch for Adobe Glyph Naming convention in > http://home.zcu.cz/~jklement/spolehlivost.pdf > > Is it normal?
It would probably be better to remove the warnings if a glyph name can not be parsed. The above pdf file has a toUnicode map so the glyph names are not required for text extraction. > BTW, i'm getting the same output than without your patch, but so many > warnings "scare" me. As the pdf has a toUnicode map the glyph names are not used for copy/paste of text so there will be no difference in output. I've created a test file to test the patch http://annarchy.freedesktop.org/~ajohnson/test.pdf The numbers "1", "2", and "3", are mapped to the text "test", "text", and "the". The "Z" has the glyph name "g1" so it should be ignored when extracting text. I have found a bug in the code. With the test file I get $ pdftotext test.pdf - Error: Could not parse charref for nameToUnicode: g1 This is = test of text extr=?tion using the glyph n=mes The output should be: This is a test of text extraction using the glyph names It looks like the glyph names "u00061" and "u0063" are not decoded correctly. _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
