Re: [Tutor] UnicodeDecodeError

Kent Johnson Wed, 23 Feb 2005 20:16:25 -0800

Michael Lange wrote:

now it looks like the total confusion seems to clear up (at least partially). After some googling it seems to me that the best bet is to use unicode strings exclusively.


I think that is a good plan.

When I set the unicode flag

in gettext.install() to 1 the gettext strings are unicode, however there's 
still a problem with the
user input. As you guessed, "self.nextfile" is unicode only *sometimes*; I 
tried and changed the line
from the old traceback into:

if unicode(self.nextfile, 'iso8859-1') == _('No destination file selected'):


How about
  n = self.nextfile
  if not isinstance(n, unicode):
    n = unicode(n, 'iso8859-1')
?

At least this might explain why "A\xe4" worked and "\xe4" not as I mentioned in 
a previous post.
Now the problem arises how to determine if self.nextfile is unicode or a byte 
string?
Or maybe even better, make sure that self.nextfile is always a byte string so I 
can safely convert
it to unicode later on. But how to convert unicode user input into byte strings 
when I don't even
know the user's encoding ? I guess this will require some further research.


Why do you need to convert back to byte strings?

You can find out the console encoding from sys.stdin and stdout:
 >>> import sys
 >>> sys.stdout.encoding
'cp437'
 >>> sys.stdin.encoding
'cp437'

IIRC there is also an encoding associated with the current locale, I'm not sure 
how to use that.

Unfortunately the latter is no option, because I definitely need portability. I guess I should probably use utf-8.


UTF-8 is your friend :-)

Kent

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UnicodeDecodeError

Reply via email to