Your message dated Fri, 13 May 2011 15:47:27 +0000
with message-id <[email protected]>
and subject line Bug#563443: fixed in python-pypdf 1.13-1
has caused the Debian Bug report #563443,
regarding python-pypdf: parsing not robust to whitespace
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)
--
563443: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=563443
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: python-pypdf
Version: 1.12-2
Severity: normal
While using pdfshuffler on PDF statements from my stock broker, on export I'd
consistently get an exception from pypdf. Note that pdfshuffler's own display,
along with evince, acroread, kpdf, etc. have no problem with these documents.
On inspection it turns out that pypdf's parsing is rather primitive
and doesn't handle the presence of extra spaces, linefeeds in place of
space, etc. Here is an example of PDF source causing problems:
9 0 obj
<<
/Type /Font
/Subtype /Type1
/Encoding 4 0 R
/BaseFont /Times-Bold
>> endobj
I will attach a patch that makes parsing more lax about whitespace in a few
places that were significant to my document. However this is just the tip
of the iceburg. Unfortunatley the pypdf code is written in a rather low-level
fashion and addressing the problem fully will be a large task.
-- System Information:
Debian Release: squeeze/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.30-2-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages python-pypdf depends on:
ii python-support 1.0.6 automated rebuilding support for P
python-pypdf recommends no packages.
python-pypdf suggests no packages.
-- no debconf information
-- debsums errors found:
debsums: changed file /usr/share/python-support/python-pypdf/pyPdf/pdf.py (from
python-pypdf package)
debsums: changed file /usr/share/python-support/python-pypdf/pyPdf/generic.py
(from python-pypdf package)
# patch to pypdf to tolerate whitespace in cases like this
# (generated by Exstream Dialogue 6.1.015):
#
# 9 0 obj
# <<
# /Type /Font
# /Subtype /Type1
# /Encoding 4 0 R
# /BaseFont /Times-Bold
# >> endobj
diff -urb orig/generic.py new/generic.py
--- orig/generic.py 2009-12-29 23:09:18.556359182 -0500
+++ new/generic.py 2009-12-29 23:07:10.780361180 -0500
@@ -35,7 +35,7 @@
__author_email__ = "[email protected]"
import re
-from utils import readNonWhitespace, RC4_encrypt
+from utils import readNonWhitespace, readUntilWhitespace, RC4_encrypt
import filters
import utils
import decimal
@@ -81,7 +81,7 @@
return NumberObject.readFromStream(stream)
peek = stream.read(20)
stream.seek(-len(peek), 1) # reset to start
- if re.match(r"(\d+)\s(\d+)\sR[^a-zA-Z]", peek) != None:
+ if re.match(r"(\d+)\s+(\d+)\sR[^a-zA-Z]", peek) != None:
return IndirectObject.readFromStream(stream, pdf)
else:
return NumberObject.readFromStream(stream)
@@ -183,19 +183,10 @@
stream.write("%s %s R" % (self.idnum, self.generation))
def readFromStream(stream, pdf):
- idnum = ""
- while True:
- tok = stream.read(1)
- if tok.isspace():
- break
- idnum += tok
- generation = ""
- while True:
- tok = stream.read(1)
- if tok.isspace():
- break
- generation += tok
- r = stream.read(1)
+ idnum = readUntilWhitespace(stream)
+ readNonWhitespace(stream); stream.seek(-1, 1)
+ generation = readUntilWhitespace(stream)
+ r = readNonWhitespace(stream)
if r != "R":
raise utils.PdfReadError("error reading indirect object reference")
return IndirectObject(int(idnum), int(generation), pdf)
diff -urb orig/pdf.py new/pdf.py
--- orig/pdf.py 2009-12-29 23:09:17.632359905 -0500
+++ new/pdf.py 2009-12-29 23:11:00.444359823 -0500
@@ -586,10 +586,13 @@
# tables that are off by whitespace bytes.
readNonWhitespace(stream); stream.seek(-1, 1)
idnum = readUntilWhitespace(stream)
+ readNonWhitespace(stream); stream.seek(-1, 1)
generation = readUntilWhitespace(stream)
- obj = stream.read(3)
- readNonWhitespace(stream)
- stream.seek(-1, 1)
+ readNonWhitespace(stream); stream.seek(-1, 1)
+ obj_token = stream.read(3)
+ if obj_token != 'obj':
+ raise utils.PdfReadError("Error reading object header")
+ readNonWhitespace(stream); stream.seek(-1, 1)
return int(idnum), int(generation)
def cacheIndirectObject(self, generation, idnum, obj):
--- End Message ---
--- Begin Message ---
Source: python-pypdf
Source-Version: 1.13-1
We believe that the bug you reported is fixed in the latest version of
python-pypdf, which is due to be installed in the Debian FTP archive:
python-pypdf_1.13-1.diff.gz
to main/p/python-pypdf/python-pypdf_1.13-1.diff.gz
python-pypdf_1.13-1.dsc
to main/p/python-pypdf/python-pypdf_1.13-1.dsc
python-pypdf_1.13-1_all.deb
to main/p/python-pypdf/python-pypdf_1.13-1_all.deb
python-pypdf_1.13.orig.tar.gz
to main/p/python-pypdf/python-pypdf_1.13.orig.tar.gz
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to [email protected],
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
Luciano Bello <[email protected]> (supplier of updated python-pypdf package)
(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing [email protected])
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Format: 1.8
Date: Fri, 13 May 2011 10:06:14 -0300
Source: python-pypdf
Binary: python-pypdf
Architecture: source all
Version: 1.13-1
Distribution: unstable
Urgency: low
Maintainer: Debian Python Modules Team
<[email protected]>
Changed-By: Luciano Bello <[email protected]>
Description:
python-pypdf - PDF toolkit implemented solely in Python
Closes: 563443 567312 593574 615961
Changes:
python-pypdf (1.13-1) unstable; urgency=low
.
* New upstream release (Closes: #615961).
- DeprecationWarning in the sets fixed (Closes: #593574)
* New Standards-Version.
* Typo in the README file fixed (Closes: #567312)
* Better extra spaces handling (Closes: #563443)
Checksums-Sha1:
e6d53a26ffb597aefd1fb3a499376f521da85f3a 1188 python-pypdf_1.13-1.dsc
ba7aed11cf21a2c218df2e3979be5eb90992dcbe 35699 python-pypdf_1.13.orig.tar.gz
6d66b9f8333fb3e1dbf9796dacf5991ea779de3b 22326 python-pypdf_1.13-1.diff.gz
7a9aeca869f7747aeb852f7837fc128d7b51bb33 35186 python-pypdf_1.13-1_all.deb
Checksums-Sha256:
6375a7570b51d1108d3d8cf2385814c8128db22523b82093622230fc9a2cb9a8 1188
python-pypdf_1.13-1.dsc
3aede4c3c9c6ad07c98f059f90db0b09ed383f7c791c46100f649e1cabda0e3b 35699
python-pypdf_1.13.orig.tar.gz
971f94e22e5a898b2ed6e62b783155c2736dcacfce18bfd8e164542fd1b5ab37 22326
python-pypdf_1.13-1.diff.gz
a5a0596bd8dfc55c5b0a3126331e8f043c63980bf4f17e025c6055843f4752ce 35186
python-pypdf_1.13-1_all.deb
Files:
9364eb40712d5ada29ecc79e34506529 1188 python optional python-pypdf_1.13-1.dsc
7a75ef56f227b78ae62d6e38d4b6b1da 35699 python optional
python-pypdf_1.13.orig.tar.gz
137a68cfe18975e18642a21aef3c6be6 22326 python optional
python-pypdf_1.13-1.diff.gz
9ac13e922aeeb6ab16e130c952aac621 35186 python optional
python-pypdf_1.13-1_all.deb
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEARECAAYFAk3NTXoACgkQQWTRs4lLtHlSEACdHQp9xPNVfjPWgTm5tOR1Rxy5
bR8An3F3hmpnVGgrVb3P9vxP7+9WuP9l
=5fwv
-----END PGP SIGNATURE-----
--- End Message ---
_______________________________________________
Python-modules-team mailing list
[email protected]
http://lists.alioth.debian.org/mailman/listinfo/python-modules-team