Package: python3-feedparser
Version: 5.2.1-1
Severity: normal

Dear maintainer(s),

The attached script uses feedparser to parse an invalid XHTML document.

If feedparser is installed from PyPI with pip, then the script succeeds
exists without error.

If feedparser is installed from Debian 10 repositories (or Archlinux, I
am told), it errors with: "TypeError: startswith first arg must be bytes
or a tuple of bytes, not str" (full traceback attached).

In all cases, feedparser 5.2.1 is used (5.2.1-1 on Debian).


I did not investigate further, but this might be caused by a different
version of sgmllib (bundled in Debian's python3-feedparser package)



-- System Information:
Debian Release: 10.2
  APT prefers oldstable-debug
  APT policy: (500, 'oldstable-debug'), (500, 'stable'), (500,
'oldstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: armhf

Kernel: Linux 4.19.0-6-amd64 (SMP w/4 CPU cores)
Kernel taint flags: TAINT_DIE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8),
LANGUAGE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages python3-feedparser depends on:
ii  python3  3.7.3-1

python3-feedparser recommends no packages.

python3-feedparser suggests no packages.

-- no debconf information
import feedparser

data = '''<?xml version='1.0' encoding='utf-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>

<entry> 
    <content type='xhtml'><div xmlns='http://www.w3.org/1999/xhtml'>
<p><i></p>
    </div></content> 
</entry>
<entry> 
    <content type='xhtml'><div xmlns='http://www.w3.org/1999/xhtml'>
<p>&#8482;</p>
    </div></content> 
</entry>
</feed>
'''

feedparser.parse(data)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/feedparser_debian/sgmllib3.py", line 
352, in finish_endtag
    method = getattr(self, 'end_' + tag)
AttributeError: '_LooseFeedParser' object has no attribute 'end_content'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "feedparser_invalid_xhtml.py", line 19, in <module>
    feedparser.parse(data)
  File "/usr/lib/python3/dist-packages/feedparser.py", line 3972, in parse
    feedparser.feed(data.decode('utf-8', 'replace'))
  File "/usr/lib/python3/dist-packages/feedparser.py", line 2131, in feed
    sgmllib.SGMLParser.feed(self, data)
  File "/usr/lib/python3/dist-packages/feedparser_debian/sgmllib3.py", line 98, 
in feed
    self.goahead(0)
  File "/usr/lib/python3/dist-packages/feedparser_debian/sgmllib3.py", line 
137, in goahead
    k = self.parse_endtag(i)
  File "/usr/lib/python3/dist-packages/feedparser_debian/sgmllib3.py", line 
314, in parse_endtag
    self.finish_endtag(tag)
  File "/usr/lib/python3/dist-packages/feedparser_debian/sgmllib3.py", line 
354, in finish_endtag
    self.unknown_endtag(tag)
  File "/usr/lib/python3/dist-packages/feedparser.py", line 704, in 
unknown_endtag
    method()
  File "/usr/lib/python3/dist-packages/feedparser.py", line 1840, in 
_end_content
    value = self.popContent('content')
  File "/usr/lib/python3/dist-packages/feedparser.py", line 1011, in popContent
    value = self.pop(tag)
  File "/usr/lib/python3/dist-packages/feedparser.py", line 863, in pop
    if piece.startswith('</'):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

_______________________________________________
Python-modules-team mailing list
[email protected]
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/python-modules-team

Reply via email to