Public bug reported:
Trying to parse the RSS feed http://www.projekt6.de/?feed=podcast with
feedparser yields the following traceback:
[EMAIL PROTECTED]:~$ python
Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> f = feedparser.parse('http://www.projekt6.de/?feed=podcast')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/var/lib/python-support/python2.5/feedparser.py", line 2624, in parse
feedparser.feed(data)
File "/var/lib/python-support/python2.5/feedparser.py", line 1441, in feed
sgmllib.SGMLParser.feed(self, data)
File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
self.goahead(0)
File "/usr/lib/python2.5/sgmllib.py", line 138, in goahead
k = self.parse_endtag(i)
File "/usr/lib/python2.5/sgmllib.py", line 315, in parse_endtag
self.finish_endtag(tag)
File "/usr/lib/python2.5/sgmllib.py", line 355, in finish_endtag
self.unknown_endtag(tag)
File "/var/lib/python-support/python2.5/feedparser.py", line 476, in
unknown_endtag
method()
File "/var/lib/python-support/python2.5/feedparser.py", line 1318, in
_end_content
value = self.popContent('content')
File "/var/lib/python-support/python2.5/feedparser.py", line 700, in
popContent
value = self.pop(tag)
File "/var/lib/python-support/python2.5/feedparser.py", line 641, in pop
output = _resolveRelativeURIs(output, self.baseuri, self.encoding)
File "/var/lib/python-support/python2.5/feedparser.py", line 1594, in
_resolveRelativeURIs
p.feed(htmlSource)
File "/var/lib/python-support/python2.5/feedparser.py", line 1441, in feed
sgmllib.SGMLParser.feed(self, data)
File "/usr/lib/python2.5/sgmllib.py", line 99, in feed
self.goahead(0)
File "/usr/lib/python2.5/sgmllib.py", line 133, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.5/sgmllib.py", line 291, in parse_starttag
self.finish_starttag(tag, attrs)
File "/usr/lib/python2.5/sgmllib.py", line 333, in finish_starttag
self.unknown_starttag(tag, attrs)
File "/var/lib/python-support/python2.5/feedparser.py", line 1589, in
unknown_starttag
_BaseHTMLProcessor.unknown_starttag(self, tag, attrs)
File "/var/lib/python-support/python2.5/feedparser.py", line 1458, in
unknown_starttag
value = unicode(value, self.encoding)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 3-8:
unsupported Unicode code range
I've created a patch against the most recent feedparser.py in Ubuntu
8.04, which will fix this problem by replacing invalid characters
instead of failing completely.
** Affects: feedparser (Ubuntu)
Importance: Undecided
Status: New
--
UnicodeDecodeError when parsing http://www.projekt6.de/?feed=podcast
https://bugs.launchpad.net/bugs/252506
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs