Package: poppler-utils
Version: 0.12.4-1.2
Severity: normal

I like to idea behind pdftotext, and have been
using it a lot.

Unfortunately, it seems to me that I recently
discovered it corrupting data.

It changed the minus sign, "-", to "2" in tables
in a scientific paper.

Maybe we agree that corrupting data in scientific
PDFs is a serious problem.

Here's how I noticed it:

1.) Download a copy of the PDF file at

    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2825258/pdf/pone.0009339.pdf

2.) At the shell prompt, type

    $ pdftotext -layout /tmp/2010-03-Cinnamon\ increases\ life\ span.pdf - | 
less

3.) Scroll down in less to Table 3.

4.) Look for the line that begins with
"Atractylodes japonica".

5.) See that the column titled "% change" is
"21.1".

6.) Look at the same number in the original PDF
file. It should be "-1.1"! The "-" was silently
corrupted to "2".

Other numbers in the same column that should
begin with "-" were also corrupted to "2".

Thanks,
Kingsley

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (990, 'unstable'), (500, 'lenny'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.32-5-686 (SMP w/2 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages poppler-utils depends on:
ii  libc6                       2.11.2-7     Embedded GNU C Library: Shared lib
ii  libfontconfig1              2.8.0-2.1    generic font configuration library
ii  libgcc1                     1:4.4.5-10   GCC support library
ii  libpoppler5                 0.12.4-1.1   PDF rendering library
ii  libstdc++6                  4.4.5-10     The GNU Standard C++ Library v3
ii  libxml2                     2.7.6.dfsg-1 GNOME XML library

Versions of packages poppler-utils recommends:
ii  ghostscript                 8.71~dfsg2-6 The GPL Ghostscript PostScript/PDF

poppler-utils suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to