Michael Lange wrote:
now it looks like the total confusion seems to clear up (at least partially). After some googling it
seems to me that the best bet is to use unicode strings exclusively.

I think that is a good plan.

When I set the unicode flag
in gettext.install() to 1 the gettext strings are unicode, however there's 
still a problem with the
user input. As you guessed, "self.nextfile" is unicode only *sometimes*; I 
tried and changed the line
from the old traceback into:

if unicode(self.nextfile, 'iso8859-1') == _('No destination file selected'):

How about n = self.nextfile if not isinstance(n, unicode): n = unicode(n, 'iso8859-1') ?

At least this might explain why "A\xe4" worked and "\xe4" not as I mentioned in 
a previous post.
Now the problem arises how to determine if self.nextfile is unicode or a byte 
string?
Or maybe even better, make sure that self.nextfile is always a byte string so I 
can safely convert
it to unicode later on. But how to convert unicode user input into byte strings 
when I don't even
know the user's encoding ? I guess this will require some further research.

Why do you need to convert back to byte strings?

You can find out the console encoding from sys.stdin and stdout:
 >>> import sys
 >>> sys.stdout.encoding
'cp437'
 >>> sys.stdin.encoding
'cp437'

IIRC there is also an encoding associated with the current locale, I'm not sure 
how to use that.

Unfortunately the latter is no option, because I definitely need portability. I guess I should probably use
utf-8.

UTF-8 is your friend :-)

Kent

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to