Is the difference the italic text "760 W. Swartzville Rd.  Reinholds, PA 17569"?
That is not the address of Zook Interiors, right?
Is that a hidden mark added by the person who created the PDF?
Maybe they intentionally used an incorrect coding.
Then the question might be how the two different methods of extracting 
information respond to invalid data in the PDF.
pdftotext does not handle that text correctly, and ps2ascii (from ghostscript 
9.16) crashes on it with
**** Warning: considering '0000000000 XXXXX n' as a free entry.
*** Warning: composite font characters dumped without decoding.
If a PDF breaks both poppler and ghostscript, the problem is probably the PDF.
pdfinfo shows that the file was made by pdftk 1.44, so it could be a bug or 
intentional change in pdftk.
William

From: [email protected]
Date: Tue, 26 May 2015 10:53:52 +0100
To: [email protected]
Subject: Re: [poppler] Incompatible number of glyphs from glib get_text{,       
layout}

On 17 January 2014 at 10:30, Peter Waller <[email protected]> wrote:
A screenshot from the poppler glib demo app demonstrates this, attached below. 
Poppler gets 696 characters and 1261 layout rectangles.<snip> 
http://pwaller.net/sw/2014-01-17-broken.pdf
<snip>
I've reported this on bugzilla here: 
https://bugs.freedesktop.org/show_bug.cgi?id=73885

Link to old thread: http://thread.gmane.org/gmane.comp.freedesktop.poppler/8683 
 
I've investigated this briefly. An observation:

http://cgit.freedesktop.org/poppler/poppler/tree/glib/poppler-page.cc?id=poppler-0.33.0#n825

The sel_text->getLength() is 1283 (which doesn't match with the 1261 from 
poppler_page_get_layout).

If I change this to use a g_strndup() with the correct length:

result = g_strndup (sel_text->getCString (), sel_text->getLength());

And then look at result[696:], then I find that the rest of the string is 
filled with 0 bytes.

I'm extremely keen to get this fixed, so any pointers would be appreciated. The 
rate of encountering this bug is increasing all the time!

Thanks,

- Peter


_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler                           
          
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to