Public bug reported:
Ubuntu Release
=============
Ubuntu 14.04.3
Package Version
============
python-pdfminer:
Installed: 20110515+dfsg-1
Candidate: 20110515+dfsg-1
Version table:
*** 20110515+dfsg-1 0
500 http://gb.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
100 /var/lib/dpkg/status
Expectation
=========
#get problem pdf
wget
http://docs.planning.cornwall.gov.uk/rpp/showimage.asp?j=PA14/04815&index=12319497&DB=8&DT=4
#try extract text
pdf2txt CornwallPlanningPlanning12319497.pdf
#The .pdf file's text should be visible in console.
What happened instead
==================
Python raises ValueError:
Traceback (most recent call last):
File "/usr/bin/pdf2txt", line 101, in <module>
if __name__ == '__main__': sys.exit(main(sys.argv))
File "/usr/bin/pdf2txt", line 95, in main
caching=caching, check_extractable=True)
File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 832, in
process_pdf
interpreter.process_page(page)
File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 757, in
process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 768, in
render_contents
self.init_resources(resources)
File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 339, in
init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 193, in
get_font
font = self.get_font(None, subspec)
File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 184, in
get_font
font = PDFCIDFont(self, spec)
File "/usr/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 637, in
__init__
CMapParser(self.unicode_map, StringIO(strm.get_data())).run()
File "/usr/lib/python2.7/dist-packages/pdfminer/cmapdb.py", line 292, in run
self.nextobject()
File "/usr/lib/python2.7/dist-packages/pdfminer/psparser.py", line 584, in
nextobject
self.do_keyword(pos, token)
File "/usr/lib/python2.7/dist-packages/pdfminer/cmapdb.py", line 311, in
do_keyword
((_,k),(_,v)) = self.pop(2)
ValueError: need more than 0 values to unpack
Potential patch [not checked if semantically correct]
==========================================
In cmapdb.py:
308 if name == 'def':
309 try:
310 ((_,k),(_,v)) = self.pop(2)
311 self.cmap.set_attr(literal_name(k), v)
312 except PSSyntaxError:
313 pass
314 return
Could become:
308 if name == 'def':
309 try:
310 ((_,k),(_,v)) = self.pop(2)
311 self.cmap.set_attr(literal_name(k), v)
312 except ValueError:
313 pass
314 except PSSyntaxError:
315 pass
316 return
** Affects: pdfminer (Ubuntu)
Importance: Undecided
Status: New
** Tags: patch patch-needswork
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1529473
Title:
pdf2text outputs uncaught error
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pdfminer/+bug/1529473/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs