Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-07 Thread rparimi
Hello,

I am trying to process an xml file that contains unicode characters
(see http://vyakarnam.wordpress.com/). Wordpress allows exporting the
entire content of the website into an xml file. Using
xml.dom.minidom,  I wrote a few lines of python code to parse out the
xml file, but am stuck with the following error:

>>> import xml.dom.minidom
>>> dom = xml.dom.minidom.parse("wordpress.2009-02-19.xml")
>>> titles = dom.getElementsByTagName("title")
>>> for title in titles:
...print "childNode = ", title.childNodes
...
childNode =  []
childNode =  []
childNode =  []
childNode =  []
childNode =  []
childNode =  Traceback (most recent call last):
  File "", line 2, in 
UnicodeEncodeError: 'ascii' codec can't encode characters in position
16-18: ordinal not in range(128)
>>>

Python exited when it was trying to parse the following node:
अन् 

The xml header tells me that the document is UTF-8:


I am running python 2.5.1 on Mac OSX 10.5.6 and my local settings are
as below:
$locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=


I googled around for similar errors, and tried using unicode but that
didn't help either:
>>> foo = unicode(titles[5].childNodes)
Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode characters in position
16-18: ordinal not in range(128)

I'm a novice with unicode, and am not not sure about how best to
handle the unicode  text I'm dealing with (devanagari). Any
suggestions will be helpful.

Thanks
--
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing unicode (devanagari) text with xml.dom.minidom

2009-03-08 Thread rparimi
On Mar 8, 12:42 am, Stefan Behnel  wrote:
> rpar...@gmail.com wrote:
> > I am trying to process an xml file that contains unicode characters
> > (seehttp://vyakarnam.wordpress.com/). Wordpress allows exporting the
> > entire content of the website into an xml file. Using
> > xml.dom.minidom,  I wrote a few lines of python code to parse out the
> > xml file, but am stuck with the following error:
>
>  import xml.dom.minidom
>  dom = xml.dom.minidom.parse("wordpress.2009-02-19.xml")
>  titles = dom.getElementsByTagName("title")
>  for title in titles:
> > ...    print "childNode = ", title.childNodes
> > ...
> > childNode =  []
> > childNode =  []
> > childNode =  []
> > childNode =  []
> > childNode =  []
> > childNode =  Traceback (most recent call last):
> >   File "", line 2, in 
> > UnicodeEncodeError: 'ascii' codec can't encode characters in position
> > 16-18: ordinal not in range(128)
>
> That's because you are printing it out to your console, in which case you
> need to make sure it's encoded properly for printing. repr() might also help.
>
> Regarding minidom, you might be happier with the xml.etree package that
> comes with Python2.5 and later (it's also avalable for older versions).
> It's a lot easier to use, more memory friendly and also much faster.
>
> Stefan

Thanks for the reply. I didn't realize that printing to console was
causing the problem. I am now able to parse out the relevant portions
of my xml file. Will also look at the xml.etree module.

Regards
--
http://mail.python.org/mailman/listinfo/python-list


redirecting output of process to a file using subprocess.Popen()

2008-07-09 Thread rparimi
I am trying to redirect stderr of a process to a temporary file and
then read back the contents of the file, all in the same python
script. As a simple exercise, I launched /bin/ls but this doesn't
work:

#!/usr/bin/python
import subprocess as proc
import tempfile
name = tempfile.NamedTemporaryFile(mode='w+b')
print 'name is '+ name.name

cmd = []
cmd.append('/bin/ls')
cmd.append('-l')
cmd.append('/tmp')
p = proc.Popen(cmd, stdout=name, stderr=proc.STDOUT, close_fds=True)
while True:
   ret = p.poll()
   if (ret is not None):
  output = name.readlines()
  print 'out = ', output
  break

$python sub.py
name is /tmp/tmpjz4NJY
out =  []


I tried calling flush() on the file object but this didn't help
either. Tried closing and re-opening the file, but closing the file
object results in it getting deleted. Can the above be made to work by
using tempfiles?

thanks
--
http://mail.python.org/mailman/listinfo/python-list


fork after creating temporary file using NamedTemporaryFile

2008-07-15 Thread rparimi
Hello pythoners,

When I create temporary file using the tempfile module, and forkI)
later on in my program, I always see errors when the program exits. Is
this because the child process deletes temp file?
Here's a stripped down version of my script that exhibits this
problem:

#!/usr/bin/python

import os
import tempfile
import sys

cmd = []
cmd.append('/bin/ls')
cmd.append('-l')
cmd.append('/tmp')

foo = tempfile.NamedTemporaryFile(mode='w+b')

pid = os.fork()
if pid:
print 'I am parent'
else:
print 'I am child'
sys.exit(0)

$ python sub.py
I am child
I am parent
Exception exceptions.OSError: (2, 'No such file or directory', '/tmp/
tmp-mZTPq') in ', mode 'w+b' at 0xb7d2a578>> ignored


How can these warnings be avoided? I tried to catch this exception
using try/except but it didn't work.

thanks!
--
http://mail.python.org/mailman/listinfo/python-list