Bug#932044: calibre: PDF to EPUB conversion failed with "No module named html2text"

2019-07-15 Thread Vincas Dargis

On 2019-07-15 03:36, Norbert Preining wrote:

Can you try installing
python-html2text
And see if that fixes the problem? 


Yes, it did the trick!



Bug#932044: calibre: PDF to EPUB conversion failed with "No module named html2text"

2019-07-14 Thread Norbert Preining
Can you try installing
   python-html2text
And see if that fixes the problem? I send that there are insufficient 
dependencies declared.

Thanks

Norbert

(Away from PC so cannot check myself atm)

On July 14, 2019 7:59:32 PM GMT+09:00, Vincas Dargis  wrote:
>Package: calibre
>Version: 3.45.2+dfsg-1
>Severity: normal
>
>Dear Maintainer,
>
>I've tried to convert freely available "Elements of Programming" PDF
>(from http://elementsofprogramming.com) into EPUB, and got this error:
>
>```
>Traceback (most recent call last):
>File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 198, in
>_manifest_add_missing
>data = item.data
> File "/usr/lib/calibre/calibre/ebooks/oeb/base.py", line 1043, in data
>data = self._parse_xhtml(data)
>File "/usr/lib/calibre/calibre/ebooks/oeb/base.py", line 960, in
>_parse_xhtml
>filename=fname, non_html_file_tags={'ncx'})
>File "/usr/lib/calibre/calibre/ebooks/oeb/parse_utils.py", line 207, in
>parse_html
>data = preprocessor(data)
>File "/usr/lib/calibre/calibre/ebooks/conversion/preprocess.py", line
>684, in __call__
>html = preprocessor(html)
>File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 784,
>in __call__
>html = self.markup_chapters(html, self.totalwords,
>self.blanks_between_paragraphs)
>File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 334,
>in markup_chapters
>html = recurse_patterns(html, False)
>File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 329,
>in recurse_patterns
>html = chapdetect.sub(self.chapter_head, html)
>File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 63, in
>chapter_head
>txt_chap = delete_quotes.sub('', delete_whitespace.sub('\\g',
>html2text(chap)))
>File "/usr/lib/calibre/calibre/utils/html2text.py", line 8, in
>html2text
>from html2text import HTML2Text
>ImportError: No module named html2text
>
>Spine item 'id1' not found
>Traceback (most recent call last):
>  File "/usr/bin/calibre-parallel", line 20, in 
>sys.exit(main())
> File "/usr/lib/calibre/calibre/utils/ipc/worker.py", line 200, in main
>result = func(*args, **kwargs)
>File "/usr/lib/calibre/calibre/gui2/convert/gui_conversion.py", line
>42, in gui_convert_override
>override_input_metadata=True)
>File "/usr/lib/calibre/calibre/gui2/convert/gui_conversion.py", line
>27, in gui_convert
>plumber.run()
>File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line
>1121, in run
>for_regex_wizard=self.for_regex_wizard,
>removed_items=getattr(self.input_plugin, 'removed_items_to_ignore',
>()))
>File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line
>1315, in create_oebbook
>reader()(oeb, path_or_stream)
>File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 71, in
>__call__
>self._all_from_opf(opf)
>File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 703, in
>_all_from_opf
>self._spine_from_opf(opf)
>File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 348, in
>_spine_from_opf
>raise OEBError("Spine is empty")
>calibre.ebooks.oeb.base.OEBError: Spine is empty
>```
>
>Is it expected to installe additional Calibre dependencies manually
>or..?
>
>Full conversion log is attached.
>
>-- System Information:
>Debian Release: bullseye/sid
>  APT prefers unstable-debug
>APT policy: (500, 'unstable-debug'), (500, 'unstable'), (1,
>'experimental-debug'), (1, 'experimental')
>Architecture: amd64 (x86_64)
>Foreign Architectures: i386
>
>Kernel: Linux 4.19.0-5-amd64 (SMP w/8 CPU cores)
>Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE,
>TAINT_UNSIGNED_MODULE
>Locale: LANG=lt_LT.UTF-8, LC_CTYPE=lt_LT.UTF-8 (charmap=UTF-8),
>LANGUAGE=lt (charmap=UTF-8)
>Shell: /bin/sh linked to /usr/bin/dash
>Init: systemd (via /run/systemd/system)
>LSM: AppArmor: enabled
>
>Versions of packages calibre depends on:
>ii  calibre-bin  3.45.2+dfsg-1
>ii  fonts-liberation 1:1.07.4-10
>ii  imagemagick  8:6.9.10.23+dfsg-2.1
>ii  imagemagick-6.q16 [imagemagick]  8:6.9.10.23+dfsg-2.1
>ii  libjpeg-turbo-progs  1:1.5.2-2+b1
>ii  libjs-coffeescript   1.12.8~dfsg-4
>ii  libjs-mathjax2.7.4+dfsg-1
>ii  optipng  0.7.7-1
>ii  poppler-utils0.71.0-5
>ii  python-apsw  3.27.2-r1-1
>ii  python-bs4   4.7.1-1
>ii  python-chardet   3.0.4-3
>ii  python-cherrypy3 8.9.1-2
>ii  python-css-parser1.0.4-1
>ii  python-cssselect 1.0.3-1
>ii  python-cssutils  1.0.2-2
>ii  python-dateutil  2.7.3-3
>ii  python-dbus  1.2.8-3
>ii  python-feedparser5.2.1-1
>ii  python-html5-parser  0.4.5-1
>ii  python-html5lib  1.0.1-1
>ii  python-lxml  4.3.3-2
>ii  python-markdown  3.0.1-3
>ii  python-mechanize 

Bug#932044: calibre: PDF to EPUB conversion failed with "No module named html2text"

2019-07-14 Thread Norbert Preining
Hi Vincas,

Thanks for the report, I'll look into it asap, most probably already tomorrow.

Best

Norbert

On July 14, 2019 7:59:32 PM GMT+09:00, Vincas Dargis  wrote:
>Package: calibre
>Version: 3.45.2+dfsg-1
>Severity: normal
>
>Dear Maintainer,
>
>I've tried to convert freely available "Elements of Programming" PDF
>(from http://elementsofprogramming.com) into EPUB, and got this error:
>
>```
>Traceback (most recent call last):
>File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 198, in
>_manifest_add_missing
>data = item.data
> File "/usr/lib/calibre/calibre/ebooks/oeb/base.py", line 1043, in data
>data = self._parse_xhtml(data)
>File "/usr/lib/calibre/calibre/ebooks/oeb/base.py", line 960, in
>_parse_xhtml
>filename=fname, non_html_file_tags={'ncx'})
>File "/usr/lib/calibre/calibre/ebooks/oeb/parse_utils.py", line 207, in
>parse_html
>data = preprocessor(data)
>File "/usr/lib/calibre/calibre/ebooks/conversion/preprocess.py", line
>684, in __call__
>html = preprocessor(html)
>File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 784,
>in __call__
>html = self.markup_chapters(html, self.totalwords,
>self.blanks_between_paragraphs)
>File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 334,
>in markup_chapters
>html = recurse_patterns(html, False)
>File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 329,
>in recurse_patterns
>html = chapdetect.sub(self.chapter_head, html)
>File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 63, in
>chapter_head
>txt_chap = delete_quotes.sub('', delete_whitespace.sub('\\g',
>html2text(chap)))
>File "/usr/lib/calibre/calibre/utils/html2text.py", line 8, in
>html2text
>from html2text import HTML2Text
>ImportError: No module named html2text
>
>Spine item 'id1' not found
>Traceback (most recent call last):
>  File "/usr/bin/calibre-parallel", line 20, in 
>sys.exit(main())
> File "/usr/lib/calibre/calibre/utils/ipc/worker.py", line 200, in main
>result = func(*args, **kwargs)
>File "/usr/lib/calibre/calibre/gui2/convert/gui_conversion.py", line
>42, in gui_convert_override
>override_input_metadata=True)
>File "/usr/lib/calibre/calibre/gui2/convert/gui_conversion.py", line
>27, in gui_convert
>plumber.run()
>File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line
>1121, in run
>for_regex_wizard=self.for_regex_wizard,
>removed_items=getattr(self.input_plugin, 'removed_items_to_ignore',
>()))
>File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line
>1315, in create_oebbook
>reader()(oeb, path_or_stream)
>File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 71, in
>__call__
>self._all_from_opf(opf)
>File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 703, in
>_all_from_opf
>self._spine_from_opf(opf)
>File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 348, in
>_spine_from_opf
>raise OEBError("Spine is empty")
>calibre.ebooks.oeb.base.OEBError: Spine is empty
>```
>
>Is it expected to installe additional Calibre dependencies manually
>or..?
>
>Full conversion log is attached.
>
>-- System Information:
>Debian Release: bullseye/sid
>  APT prefers unstable-debug
>APT policy: (500, 'unstable-debug'), (500, 'unstable'), (1,
>'experimental-debug'), (1, 'experimental')
>Architecture: amd64 (x86_64)
>Foreign Architectures: i386
>
>Kernel: Linux 4.19.0-5-amd64 (SMP w/8 CPU cores)
>Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE,
>TAINT_UNSIGNED_MODULE
>Locale: LANG=lt_LT.UTF-8, LC_CTYPE=lt_LT.UTF-8 (charmap=UTF-8),
>LANGUAGE=lt (charmap=UTF-8)
>Shell: /bin/sh linked to /usr/bin/dash
>Init: systemd (via /run/systemd/system)
>LSM: AppArmor: enabled
>
>Versions of packages calibre depends on:
>ii  calibre-bin  3.45.2+dfsg-1
>ii  fonts-liberation 1:1.07.4-10
>ii  imagemagick  8:6.9.10.23+dfsg-2.1
>ii  imagemagick-6.q16 [imagemagick]  8:6.9.10.23+dfsg-2.1
>ii  libjpeg-turbo-progs  1:1.5.2-2+b1
>ii  libjs-coffeescript   1.12.8~dfsg-4
>ii  libjs-mathjax2.7.4+dfsg-1
>ii  optipng  0.7.7-1
>ii  poppler-utils0.71.0-5
>ii  python-apsw  3.27.2-r1-1
>ii  python-bs4   4.7.1-1
>ii  python-chardet   3.0.4-3
>ii  python-cherrypy3 8.9.1-2
>ii  python-css-parser1.0.4-1
>ii  python-cssselect 1.0.3-1
>ii  python-cssutils  1.0.2-2
>ii  python-dateutil  2.7.3-3
>ii  python-dbus  1.2.8-3
>ii  python-feedparser5.2.1-1
>ii  python-html5-parser  0.4.5-1
>ii  python-html5lib  1.0.1-1
>ii  python-lxml  4.3.3-2
>ii  python-markdown  3.0.1-3
>ii  python-mechanize 1:0.2.5-3
>ii  python-msgpack   0.5.6-1+b1
>ii  python-netifaces

Bug#932044: calibre: PDF to EPUB conversion failed with "No module named html2text"

2019-07-14 Thread Vincas Dargis
Package: calibre
Version: 3.45.2+dfsg-1
Severity: normal

Dear Maintainer,

I've tried to convert freely available "Elements of Programming" PDF
(from http://elementsofprogramming.com) into EPUB, and got this error:

```
Traceback (most recent call last):
  File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 198, in 
_manifest_add_missing
data = item.data
  File "/usr/lib/calibre/calibre/ebooks/oeb/base.py", line 1043, in data
data = self._parse_xhtml(data)
  File "/usr/lib/calibre/calibre/ebooks/oeb/base.py", line 960, in _parse_xhtml
filename=fname, non_html_file_tags={'ncx'})
  File "/usr/lib/calibre/calibre/ebooks/oeb/parse_utils.py", line 207, in 
parse_html
data = preprocessor(data)
  File "/usr/lib/calibre/calibre/ebooks/conversion/preprocess.py", line 684, in 
__call__
html = preprocessor(html)
  File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 784, in 
__call__
html = self.markup_chapters(html, self.totalwords, 
self.blanks_between_paragraphs)
  File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 334, in 
markup_chapters
html = recurse_patterns(html, False)
  File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 329, in 
recurse_patterns
html = chapdetect.sub(self.chapter_head, html)
  File "/usr/lib/calibre/calibre/ebooks/conversion/utils.py", line 63, in 
chapter_head
txt_chap = delete_quotes.sub('', delete_whitespace.sub('\\g', 
html2text(chap)))
  File "/usr/lib/calibre/calibre/utils/html2text.py", line 8, in html2text
from html2text import HTML2Text
ImportError: No module named html2text

Spine item 'id1' not found
Traceback (most recent call last):
  File "/usr/bin/calibre-parallel", line 20, in 
sys.exit(main())
  File "/usr/lib/calibre/calibre/utils/ipc/worker.py", line 200, in main
result = func(*args, **kwargs)
  File "/usr/lib/calibre/calibre/gui2/convert/gui_conversion.py", line 42, in 
gui_convert_override
override_input_metadata=True)
  File "/usr/lib/calibre/calibre/gui2/convert/gui_conversion.py", line 27, in 
gui_convert
plumber.run()
  File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line 1121, in 
run
for_regex_wizard=self.for_regex_wizard, 
removed_items=getattr(self.input_plugin, 'removed_items_to_ignore', ()))
  File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line 1315, in 
create_oebbook
reader()(oeb, path_or_stream)
  File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 71, in __call__
self._all_from_opf(opf)
  File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 703, in 
_all_from_opf
self._spine_from_opf(opf)
  File "/usr/lib/calibre/calibre/ebooks/oeb/reader.py", line 348, in 
_spine_from_opf
raise OEBError("Spine is empty")
calibre.ebooks.oeb.base.OEBError: Spine is empty
```

Is it expected to installe additional Calibre dependencies manually or..?

Full conversion log is attached.

-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'unstable'), (1, 
'experimental-debug'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.19.0-5-amd64 (SMP w/8 CPU cores)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, 
TAINT_UNSIGNED_MODULE
Locale: LANG=lt_LT.UTF-8, LC_CTYPE=lt_LT.UTF-8 (charmap=UTF-8), LANGUAGE=lt 
(charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages calibre depends on:
ii  calibre-bin  3.45.2+dfsg-1
ii  fonts-liberation 1:1.07.4-10
ii  imagemagick  8:6.9.10.23+dfsg-2.1
ii  imagemagick-6.q16 [imagemagick]  8:6.9.10.23+dfsg-2.1
ii  libjpeg-turbo-progs  1:1.5.2-2+b1
ii  libjs-coffeescript   1.12.8~dfsg-4
ii  libjs-mathjax2.7.4+dfsg-1
ii  optipng  0.7.7-1
ii  poppler-utils0.71.0-5
ii  python-apsw  3.27.2-r1-1
ii  python-bs4   4.7.1-1
ii  python-chardet   3.0.4-3
ii  python-cherrypy3 8.9.1-2
ii  python-css-parser1.0.4-1
ii  python-cssselect 1.0.3-1
ii  python-cssutils  1.0.2-2
ii  python-dateutil  2.7.3-3
ii  python-dbus  1.2.8-3
ii  python-feedparser5.2.1-1
ii  python-html5-parser  0.4.5-1
ii  python-html5lib  1.0.1-1
ii  python-lxml  4.3.3-2
ii  python-markdown  3.0.1-3
ii  python-mechanize 1:0.2.5-3
ii  python-msgpack   0.5.6-1+b1
ii  python-netifaces 0.10.4-1+b1
ii  python-pil   6.1.0-1
ii  python-pkg-resources 41.0.1-1
ii  python-pyparsing 2.2.0+dfsg1-2
ii  python-pyqt5 5.11.3+dfsg-1+b3
ii