Public bug reported:

Ubuntu Release
=============
Ubuntu 14.04.3

Package Version
============
python-pdfminer:
  Installed: 20110515+dfsg-1
  Candidate: 20110515+dfsg-1
  Version table:
 *** 20110515+dfsg-1 0
        500 http://gb.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
        100 /var/lib/dpkg/status

Expectation
=========
#get problem pdf
wget 
http://docs.planning.cornwall.gov.uk/rpp/showimage.asp?j=PA14/04815&index=12319497&DB=8&DT=4
#try extract text
pdf2txt CornwallPlanningPlanning12319497.pdf
#The .pdf file's text should be visible in console.

What happened instead
==================
Python raises ValueError:

Traceback (most recent call last):
  File "/usr/bin/pdf2txt", line 101, in <module>
    if __name__ == '__main__': sys.exit(main(sys.argv))
  File "/usr/bin/pdf2txt", line 95, in main
    caching=caching, check_extractable=True)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 832, in 
process_pdf
    interpreter.process_page(page)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 757, in 
process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 768, in 
render_contents
    self.init_resources(resources)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 339, in 
init_resources
    self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 193, in 
get_font
    font = self.get_font(None, subspec)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 184, in 
get_font
    font = PDFCIDFont(self, spec)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 637, in 
__init__
    CMapParser(self.unicode_map, StringIO(strm.get_data())).run()
  File "/usr/lib/python2.7/dist-packages/pdfminer/cmapdb.py", line 292, in run
    self.nextobject()
  File "/usr/lib/python2.7/dist-packages/pdfminer/psparser.py", line 584, in 
nextobject
    self.do_keyword(pos, token)
  File "/usr/lib/python2.7/dist-packages/pdfminer/cmapdb.py", line 311, in 
do_keyword
    ((_,k),(_,v)) = self.pop(2)
ValueError: need more than 0 values to unpack

Potential patch [not checked if semantically correct]
==========================================
In cmapdb.py:
    308         if name == 'def':
    309             try:
    310                 ((_,k),(_,v)) = self.pop(2)
    311                 self.cmap.set_attr(literal_name(k), v)
    312             except PSSyntaxError:
    313                 pass
    314             return

Could become:
    308         if name == 'def':
    309             try:
    310                 ((_,k),(_,v)) = self.pop(2)
    311                 self.cmap.set_attr(literal_name(k), v)
    312             except ValueError:
    313                 pass
    314             except PSSyntaxError:
    315                 pass
    316             return

** Affects: pdfminer (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: patch patch-needswork

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1529473

Title:
  pdf2text outputs uncaught error

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pdfminer/+bug/1529473/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to