Package: poppler-utils Version: 0.12.4-1.2 Severity: normal
I like to idea behind pdftotext, and have been using it a lot. Unfortunately, it seems to me that I recently discovered it corrupting data. It changed the minus sign, "-", to "2" in tables in a scientific paper. Maybe we agree that corrupting data in scientific PDFs is a serious problem. Here's how I noticed it: 1.) Download a copy of the PDF file at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2825258/pdf/pone.0009339.pdf 2.) At the shell prompt, type $ pdftotext -layout /tmp/2010-03-Cinnamon\ increases\ life\ span.pdf - | less 3.) Scroll down in less to Table 3. 4.) Look for the line that begins with "Atractylodes japonica". 5.) See that the column titled "% change" is "21.1". 6.) Look at the same number in the original PDF file. It should be "-1.1"! The "-" was silently corrupted to "2". Other numbers in the same column that should begin with "-" were also corrupted to "2". Thanks, Kingsley -- System Information: Debian Release: lenny/sid APT prefers unstable APT policy: (990, 'unstable'), (500, 'lenny'), (1, 'experimental') Architecture: i386 (i686) Kernel: Linux 2.6.32-5-686 (SMP w/2 CPU cores) Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1) Shell: /bin/sh linked to /bin/bash Versions of packages poppler-utils depends on: ii libc6 2.11.2-7 Embedded GNU C Library: Shared lib ii libfontconfig1 2.8.0-2.1 generic font configuration library ii libgcc1 1:4.4.5-10 GCC support library ii libpoppler5 0.12.4-1.1 PDF rendering library ii libstdc++6 4.4.5-10 The GNU Standard C++ Library v3 ii libxml2 2.7.6.dfsg-1 GNOME XML library Versions of packages poppler-utils recommends: ii ghostscript 8.71~dfsg2-6 The GPL Ghostscript PostScript/PDF poppler-utils suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org