Public bug reported:

Currently it stops convertion on any decode error:

$ html2markdown broken_text 
Traceback (most recent call last):
  File "/usr/bin/html2markdown", line 9, in <module>
    load_entry_point('html2text==3.200.3', 'console_scripts', 'html2text')()
  File "/usr/lib/python3/dist-packages/html2text.py", line 781, in main
    data = data.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 4: invalid 
start byte

But for the files I'm working on it would be perfectly fine just to add

data = data.decode(encoding, errors='ignore')

It can be exposed as an option.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: python3-html2text 3.200.3-2
ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
CurrentDesktop: KDE
Date: Sat May 10 21:02:23 2014
PackageArchitecture: all
SourcePackage: python-html2text
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: python-html2text (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug trusty

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1318227

Title:
  Allow selecting decode errors bahaviour

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/python-html2text/+bug/1318227/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to