I'm trying to write a little rss feed reader, but having trouble with unicode. I would appreciate some help as I feel I'm going round in circles.
Even when the save command works, ElementTree won't or vice-versa. You can see what I've been trying from my commented out lines. I think there is a problem with my understanding of unicode, so feel free to enlighten me. What encoding is the xml string before I do anything? Does my approach below make any sense??? import urllib, re, os, sys os.environ['DJANGO_SETTINGS_MODULE'] = 'djsite.settings' from djsite.djapp.models import Feed from xml.etree import ElementTree url = 'http://www.osirra.com/rss/rss20/1' #'http://www.michaelmoore.com/rss/mikeinthenews.xml' #'http://www.michaelmoore.com/rss/mustread.xml' f = urllib.urlopen(url) xml = f.read() f.close() feed = Feed.objects.get(url=url) if xml: ms = re.findall('\<\?xml version\=\"[^"]+\" encoding\=\"([^"]+)\"\?\>', xml) if ms: encoding = ms[0] else: encoding = 'utf-8' print 'using encoding:', encoding #xml = xml.encode(encoding, 'replace') ##xml = xml.decode(encoding, 'replace') #xml = unicode(xml, encoding) #xml = unicode(xml) elem = ElementTree.fromstring(xml) #do stuff with elem... feed.xml = xml feed.save() Thanks for your time :-) _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor