"kai_nerda" <ew...@yahoo.com> wrote in message
news:hp69ri+a...@egroups.com...
Hi,
OS = Windows XP (German language)
Python = 3.1.2
I need to write a directory listing into a XML file.
And after hours of trying and searching i have no clue.
My main problem is that the file and folder names can
have characters of different languages like
German, Turkish, Russian, maybe else.
Because Python 3.1 is better with unicode, I
decided to use that instead of 2.6
For testing I have created the following files:
http://img340.imageshack.us/img340/3461/files.png
(google for the words
russia, turkish, deutsch, france
to find websites with special characters and copy & paste)
And this is the code I have now:
############################################
# -*- coding: iso-8859-1 -*-
# inspired by:
# http://www.dpawson.co.uk/java/dirlist.py
# (for Python ~2.4)
import sys
print ('filesystemencoding: ' + sys.getfilesystemencoding())
print ('defaultencoding: ' + sys.getdefaultencoding())
from pprint import pprint
import os.path
from stat import *
from xml.sax.saxutils import XMLGenerator
def recurse_dir(path, writer):
for cdir, subdirs, files in os.walk(path):
pprint (cdir)
writer.startElement('dir', { 'name': cdir })
for f in files:
uf = f.encode('utf-8')
pprint (uf)
attribs = {'name': f}
attribs['size'] = str(os.stat(os.path.join(cdir,f))[ST_SIZE])
pprint (attribs)
writer.startElement('file', attribs)
writer.endElement('file')
for subdir in subdirs:
recurse_dir(os.path.join(cdir, subdir), writer)
writer.endElement('directory')
This should be:
writer.endElement('dir')
break
if __name__ == '__main__':
directory = 'c:\\_TEST\\'
out = open('C:\\_TEST.xml','w')
The above line opens the file in the default file system encoding 'mbcs'
(cp850 on your system). Try:
out = open('C:\\_TEST.xml','w',encoding='utf8')
Regards,
-Mark
writer = XMLGenerator(out, 'utf-8')
writer.startDocument()
recurse_dir(directory, writer)
out.close()
############################################
--
http://mail.python.org/mailman/listinfo/python-list