Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-08 Thread Martin v. Löwis
Regarding minidom, you might be happier with the xml.etree package that comes with Python2.5 and later (it's also avalable for older versions). It's a lot easier to use, more memory friendly and also much faster. OTOH, choice of XML library is completely irrelevant for the issue at hand. If

Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-08 Thread Stefan Behnel
Martin v. Löwis wrote: Regarding minidom, you might be happier with the xml.etree package that comes with Python2.5 and later (it's also avalable for older versions). It's a lot easier to use, more memory friendly and also much faster. OTOH, choice of XML library is completely irrelevant for

Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-08 Thread Martin v. Löwis
For the described problem, maybe. But certainly not for the application. The background was parsing the XML dump of an entire web site, which I would expect to be larger than what minidom is designed to handle gracefully. Switching to cElementTree before major code gets written is almost

comparing (c)ElementTree and minidom (was: Parsing unicode (devanagari) text with xml.dom.minidom)

2009-03-08 Thread Stefan Behnel
Martin v. Löwis wrote: The background was parsing the XML dump of an entire web site, which I would expect to be larger than what minidom is designed to handle gracefully. Switching to cElementTree before major code gets written is almost certainly a good idea here. I think minidom is

Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-08 Thread rparimi
On Mar 8, 12:42 am, Stefan Behnel stefan...@behnel.de wrote: rpar...@gmail.com wrote: I am trying to process an xml file that contains unicode characters (seehttp://vyakarnam.wordpress.com/). Wordpress allows exporting the entire content of the website into an xml file. Using

Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-07 Thread rparimi
Hello, I am trying to process an xml file that contains unicode characters (see http://vyakarnam.wordpress.com/). Wordpress allows exporting the entire content of the website into an xml file. Using xml.dom.minidom, I wrote a few lines of python code to parse out the xml file, but am stuck with

Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-07 Thread Stefan Behnel
rpar...@gmail.com wrote: I am trying to process an xml file that contains unicode characters (see http://vyakarnam.wordpress.com/). Wordpress allows exporting the entire content of the website into an xml file. Using xml.dom.minidom, I wrote a few lines of python code to parse out the xml