On Mar 8, 12:42 am, Stefan Behnel <stefan...@behnel.de> wrote: > rpar...@gmail.com wrote: > > I am trying to process an xml file that contains unicode characters > > (seehttp://vyakarnam.wordpress.com/). Wordpress allows exporting the > > entire content of the website into an xml file. Using > > xml.dom.minidom, I wrote a few lines of python code to parse out the > > xml file, but am stuck with the following error: > > >>>> import xml.dom.minidom > >>>> dom = xml.dom.minidom.parse("wordpress.2009-02-19.xml") > >>>> titles = dom.getElementsByTagName("title") > >>>> for title in titles: > > ... print "childNode = ", title.childNodes > > ... > > childNode = [<DOM Text node "Sanskrit N...">] > > childNode = [<DOM Text node "Sanskrit N...">] > > childNode = [] > > childNode = [] > > childNode = [<DOM Text node "1-1-1">] > > childNode = Traceback (most recent call last): > > File "<stdin>", line 2, in <module> > > UnicodeEncodeError: 'ascii' codec can't encode characters in position > > 16-18: ordinal not in range(128) > > That's because you are printing it out to your console, in which case you > need to make sure it's encoded properly for printing. repr() might also help. > > Regarding minidom, you might be happier with the xml.etree package that > comes with Python2.5 and later (it's also avalable for older versions). > It's a lot easier to use, more memory friendly and also much faster. > > Stefan
Thanks for the reply. I didn't realize that printing to console was causing the problem. I am now able to parse out the relevant portions of my xml file. Will also look at the xml.etree module. Regards -- http://mail.python.org/mailman/listinfo/python-list