Tim Roberts wrote:
josh logan <[EMAIL PROTECTED]> wrote:
I am using Python 3.0b2.
I have an XML file that has the unicode character '\u012b' in it,
which, when parsed, causes a UnicodeEncodeError:

'charmap' codec can't encode character '\u012b' in position 26:
character maps to <undefined>

This happens even when I assign this character to a reference in the
interpreter:

Python 3.0b2 (r30b2:65106, Jul 18 2008, 18:44:17) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
s = '\u012b'
s
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Python30\lib\io.py", line 1428, in write
   b = encoder.encode(s)
 File "C:\Python30\lib\encodings\cp437.py", line 19, in encode
   return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u012b' in
position
1: character maps to <undefined>

Is this a known issue, or am I doing something wrong?

Both.  U+012B is the Latin lower-case i with macron (i with a bar instead
of a dot).  That character does not exist in the 8-bit character set CP437,
which you are trying to use.

If you choose an 8-bit character set that includes i-with-macron, then it
will work.  UTF-8 would be a good choice.  It's in ISO-8859-10.

I doubt the OP 'chose' cp437. Why does Python using cp437 even when the default encoding is utf-8?

On WinXP
>>> sys.getdefaultencoding()
'utf-8'
>>> s='\u012b'
>>> s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python30\lib\io.py", line 1428, in write
    b = encoder.encode(s)
File "C:\Program Files\Python30\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u012b' in position
1: character maps to <undefined>

To put it another way, how can one 'choose' utf-8 for display to screen?

Using IDLE, display works fine.

IDLE 3.0b2
>>> s='\u012b'
>>> s
'ī' # i macron
>>> import sys
>>> sys.getdefaultencoding()
'utf-8'

I ran across this is a different context and mentioned it on the bug tracker, but the Windows interpreter seems broken here.

I will send this in UTF-8 so the i-macron will hopefully show up.

tjr

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to