Hello Albert, On 30/04/2008, at 7:56 AM, Albert Astals Cid wrote: > Available from > http://poppler.freedesktop.org/poppler-0.8.2.tar.gz
> Testing, patches and bug reports welcome. I joined this list recently, to see whether the Poppler versions of the Xpdf utilities worked any differently from the non-Poppler versions. I'm working on a Mac, with MacOS X v10.4.11, and have successfully built the utilities from this latest release. All of pdfinfo, pdffonts, pdftohtml, pdftotext, pdftops, pdftoppm and pdfimages work fine on a simple 1-page PDF that I created with pdfTeX. However, all of these fail with a "Bus error" on more complicated multi-page PDFs, which you can find here: http://www.maths.mq.edu.au/~ross/5019-e-cmap.pdf http://www.maths.mq.edu.au/~ross/5019-e-mmap.pdf I'm particularly interested in pdffonts, pdftohtml, pdftotext as I want a free tool to be able to correctly extract the text from documents such as the above PDFs. They must extract the *complete* textual contents, using the CMap font-encoding resources that these PDFs contain. Non-poppler versions of the utilities; e.g. rossmoor% pdftotext -v pdftotext version 3.02 Copyright 1996-2007 Glyph & Cog, LLC work to some extent, but certainly not completely. (pdfimages works but the output is incomplete and useless and pdftoppm also gives a Bus error .) For example, this is part of the text extracted from 5019-e-mmap.pdf using pdftotext (v3.02) Figure 1: The Moebius strip. Consider the two-sheeted covering \pi : \BbbS 2 \rightar P and the inverse image \pi - 1 (L) of one of these circles. It's pretty good, except that \rightarrow has been truncated to 8 characters. There are many similar instances within the full text. However, the Poppler version doesn't get far enough through the document to see this --- at least not for me. BTW, the text selection in Adobe Reader (versions 7.* & 8.*) does extract the text more completely; so there is either a bug or a design flaw within the pdftotext utility. > > Albert Hope this helps, and that you can help me. Cheers, Ross ------------------------------------------------------------------------ Ross Moore [EMAIL PROTECTED] Mathematics Department office: E7A-419 Macquarie University tel: +61 (0)2 9850 8955 Sydney, Australia 2109 fax: +61 (0)2 9850 8114 ------------------------------------------------------------------------ _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
