So I wrote a little video podcast downloading script that checks a list of RSS feeds and downloads any new videos. Every once in a while it find a character that is out of the 128 range in the feed and my script blows up:
Traceback (most recent call last): File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 88, in <module> mainloop() File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 75, in mainloop update() File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 69, in update couldhave = getshowlst(x[1],episodecnt) File "C:\Users\Adam\Desktop\Rev3 DL\Rev3.py", line 30, in getshowlst masterlist = XMLWorkspace.parsexml(url) File "C:\Users\Adam\Desktop\Rev3 DL\XMLWorkspace.py", line 54, in parsexml parse(url, FeedHandlerInst) File "C:\Python25\lib\xml\sax\__init__.py", line 33, in parse parser.parse(source) File "C:\Python25\lib\xml\sax\expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "C:\Python25\lib\xml\sax\xmlreader.py", line 123, in parse self.feed(buffer) File "C:\Python25\lib\xml\sax\expatreader.py", line 207, in feed self._parser.Parse(data, isFinal) File "C:\Users\Adam\Desktop\Rev3 DL\XMLWorkspace.py", line 51, in characters self.data.append(string) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 236: ordinal not in range(128) Now its my understanding that XML can contain upper Unicode characters as long as the encoding is specified, which it is (UTF-8). The feed validates every validator I've ran it through, every program I open it with seems to be ok with it, except my python script. Why? Here is the URL of the feed in question: http://revision3.com/winelibraryreserve/ My script is complaining of the fancy e in Mourvèdre At first glance I though it was the data.append(string) that was un accepting of the Unicode, but even if I put a return in the Character handler loop, it still breaks. What am I doing wrong? -- http://mail.python.org/mailman/listinfo/python-list