Re: [Tutor] Encode problem

Mark Tolonen Mon, 04 May 2009 22:15:04 -0700

"spir" <denis.s...@free.fr> wrote in messagenews:20090501220601.31891...@o...

Le Fri, 1 May 2009 15:19:29 -0300,
"Pablo P. F. de Faria" <pablofa...@gmail.com> s'exprima ainsi:
self.cfg.write(codecs.open(self.properties_file,'w','utf-8'))

As one can see, the character encoding is explicitly UTF-8. But
ConfigParser keeps trying to save it as a 'ascii' file and gives me
error for directory-names containing >128 code characters (like "Á").
It is just a horrible thing to me, for my app will be used mostly by
brazillians.
Just superficial suggestions, only because it's 1st of May and WE so thatbetter answers won't maybe come up before monday.
If all what you describe is right, then there must be something wrong withchar encoding in configParser's write method. Have you had a look at it?While I hardly imagine why/how ConfigParser would limit file pathes to7-bit ASCII...Also, for porteguese characters, you shouldn't even need explicitencoding; they should pass through silently because they fit in an 8 bitlatin charset. (I never encode french path/file names.)

The below works. ConfigParser isn't written to support Unicode correctly.I was able to get Unicode sections to write out, but it was just luck.Unicode keys and values break as the OP discovered. So treat everything asbyte strings:


----------------------------------------------------
# coding: utf-8
# Note coding is required because of non-ascii
# in the source code.  This ONLY controls the
# encoding of the source file characters saved to disk.
import ConfigParser
import glob
import sys
c = ConfigParser.ConfigParser()
c.add_section('马克') # this is a utf-8 encoded byte string...no u'')
c.set('马克','多少','明白') # so are these

# The following could be glob.glob(u'.') to get a filename in
# Unicode, but this is for illustration that the encoding of the
# source file has no bearing on the encoding strings other than
# one's hard-coded in the source file.  The 'files' list will be byte
# strings in the default file system encoding.  Which for Windows
# is 'mbcs'...a magic value that changes depending on the
# which country's version of Windows is running.
files = glob.glob('*.txt')
c.add_section('files')

for i,fn in enumerate(files):
   fn = fn.decode(sys.getfilesystemencoding())
   fn = fn.encode('utf-8')
   c.set('files','file%d'%(i+1),fn)

# Don't need a codec here...everything is already UTF8.
c.write(open('chinese.txt','wt'))
--------------------------------------------------------------

Here is the content of my utf-8 file:

-----------------------------
[files]
file3 = ascii.txt
file2 = chinese.txt
file1 = blah.txt
file5 = ÀÈÌÒÙ.txt
file4 = other.txt

[马克]
多少 = 明白
----------------------------

Hope this helps,
Mark


_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Encode problem

Reply via email to