Public bug reported:
Currently it stops convertion on any decode error:
$ html2markdown broken_text
Traceback (most recent call last):
File "/usr/bin/html2markdown", line 9, in <module>
load_entry_point('html2text==3.200.3', 'console_scripts', 'html2text')()
File "/usr/lib/python3/dist-packages/html2text.py", line 781, in main
data = data.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 4: invalid
start byte
But for the files I'm working on it would be perfectly fine just to add
data = data.decode(encoding, errors='ignore')
It can be exposed as an option.
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: python3-html2text 3.200.3-2
ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
CurrentDesktop: KDE
Date: Sat May 10 21:02:23 2014
PackageArchitecture: all
SourcePackage: python-html2text
UpgradeStatus: No upgrade log present (probably fresh install)
** Affects: python-html2text (Ubuntu)
Importance: Undecided
Status: New
** Tags: amd64 apport-bug trusty
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1318227
Title:
Allow selecting decode errors bahaviour
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python-html2text/+bug/1318227/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs