Parsing unicode (devanagari) text with xml.dom.minidom
Hello, I am trying to process an xml file that contains unicode characters (see http://vyakarnam.wordpress.com/). Wordpress allows exporting the entire content of the website into an xml file. Using xml.dom.minidom, I wrote a few lines of python code to parse out the xml file, but am stuck with the following error: >>> import xml.dom.minidom >>> dom = xml.dom.minidom.parse("wordpress.2009-02-19.xml") >>> titles = dom.getElementsByTagName("title") >>> for title in titles: ...print "childNode = ", title.childNodes ... childNode = [] childNode = [] childNode = [] childNode = [] childNode = [] childNode = Traceback (most recent call last): File "", line 2, in UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-18: ordinal not in range(128) >>> Python exited when it was trying to parse the following node: अन् The xml header tells me that the document is UTF-8: I am running python 2.5.1 on Mac OSX 10.5.6 and my local settings are as below: $locale LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL= I googled around for similar errors, and tried using unicode but that didn't help either: >>> foo = unicode(titles[5].childNodes) Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-18: ordinal not in range(128) I'm a novice with unicode, and am not not sure about how best to handle the unicode text I'm dealing with (devanagari). Any suggestions will be helpful. Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing unicode (devanagari) text with xml.dom.minidom
On Mar 8, 12:42 am, Stefan Behnel wrote: > rpar...@gmail.com wrote: > > I am trying to process an xml file that contains unicode characters > > (seehttp://vyakarnam.wordpress.com/). Wordpress allows exporting the > > entire content of the website into an xml file. Using > > xml.dom.minidom, I wrote a few lines of python code to parse out the > > xml file, but am stuck with the following error: > > import xml.dom.minidom > dom = xml.dom.minidom.parse("wordpress.2009-02-19.xml") > titles = dom.getElementsByTagName("title") > for title in titles: > > ... print "childNode = ", title.childNodes > > ... > > childNode = [] > > childNode = [] > > childNode = [] > > childNode = [] > > childNode = [] > > childNode = Traceback (most recent call last): > > File "", line 2, in > > UnicodeEncodeError: 'ascii' codec can't encode characters in position > > 16-18: ordinal not in range(128) > > That's because you are printing it out to your console, in which case you > need to make sure it's encoded properly for printing. repr() might also help. > > Regarding minidom, you might be happier with the xml.etree package that > comes with Python2.5 and later (it's also avalable for older versions). > It's a lot easier to use, more memory friendly and also much faster. > > Stefan Thanks for the reply. I didn't realize that printing to console was causing the problem. I am now able to parse out the relevant portions of my xml file. Will also look at the xml.etree module. Regards -- http://mail.python.org/mailman/listinfo/python-list
redirecting output of process to a file using subprocess.Popen()
I am trying to redirect stderr of a process to a temporary file and then read back the contents of the file, all in the same python script. As a simple exercise, I launched /bin/ls but this doesn't work: #!/usr/bin/python import subprocess as proc import tempfile name = tempfile.NamedTemporaryFile(mode='w+b') print 'name is '+ name.name cmd = [] cmd.append('/bin/ls') cmd.append('-l') cmd.append('/tmp') p = proc.Popen(cmd, stdout=name, stderr=proc.STDOUT, close_fds=True) while True: ret = p.poll() if (ret is not None): output = name.readlines() print 'out = ', output break $python sub.py name is /tmp/tmpjz4NJY out = [] I tried calling flush() on the file object but this didn't help either. Tried closing and re-opening the file, but closing the file object results in it getting deleted. Can the above be made to work by using tempfiles? thanks -- http://mail.python.org/mailman/listinfo/python-list
fork after creating temporary file using NamedTemporaryFile
Hello pythoners, When I create temporary file using the tempfile module, and forkI) later on in my program, I always see errors when the program exits. Is this because the child process deletes temp file? Here's a stripped down version of my script that exhibits this problem: #!/usr/bin/python import os import tempfile import sys cmd = [] cmd.append('/bin/ls') cmd.append('-l') cmd.append('/tmp') foo = tempfile.NamedTemporaryFile(mode='w+b') pid = os.fork() if pid: print 'I am parent' else: print 'I am child' sys.exit(0) $ python sub.py I am child I am parent Exception exceptions.OSError: (2, 'No such file or directory', '/tmp/ tmp-mZTPq') in ', mode 'w+b' at 0xb7d2a578>> ignored How can these warnings be avoided? I tried to catch this exception using try/except but it didn't work. thanks! -- http://mail.python.org/mailman/listinfo/python-list