I think it helped me very much to understand the problem.
So if i deal with nonascii strings, i have a 'list of bytes' and need an
encoding to interpret this list and transform it to a meaningful unicode
string. Decoding does the opposite.
Whenever i 'cross the border' of my program, i have to encode the 'list
of bytes' to an unicode string or decode the unicode string to a 'list
of bytes' which is meaningful to the world outside.
So encode early, decode lately means, to do it as near to the border as
possible and to encode/decode i need a coding system, for example 'utf8'
That means, there should be an encoding/decoding possibility to every
interface i can use: files, stdin, stdout, stderr, gui (should be the
most important ones).
While trying to understand this, i wrote the following program. Maybe
someone can give me a hint, how to print correctly:
######################################################
#! python
# -*- coding: utf-8 -*-
class EncTest:
def __init__(self,Name=None):
self.Name=unicode(Name, encoding='utf8')
def __repr__(self):
return u'My name is %s' % self.Name
if __name__ == '__main__':
a = EncTest('Müller')
# this does work
print a.__repr__()
# throws an error if default encoding is ascii
# but works if default encoding is utf8
print a
# throws an error because a is not a string
print unicode(a, encoding='utf8')
######################################################
Wolfgang
--
http://mail.python.org/mailman/listinfo/python-list