On Wed, 23 Feb 2005 23:16:20 -0500 Kent Johnson <[EMAIL PROTECTED]> wrote:
> How about > n = self.nextfile > if not isinstance(n, unicode): > n = unicode(n, 'iso8859-1') > ? > > > At least this might explain why "A\xe4" worked and "\xe4" not as I > > mentioned in a previous post. > > Now the problem arises how to determine if self.nextfile is unicode or a > > byte string? > > Or maybe even better, make sure that self.nextfile is always a byte string > > so I can safely convert > > it to unicode later on. But how to convert unicode user input into byte > > strings when I don't even > > know the user's encoding ? I guess this will require some further research. > > Why do you need to convert back to byte strings? > > You can find out the console encoding from sys.stdin and stdout: > >>> import sys > >>> sys.stdout.encoding > 'cp437' > >>> sys.stdin.encoding > 'cp437' > I *thought* I would have to convert the user input which might be any encoding back into byte string first (remember, I got heavily confused, because user input was sometimes unicode and sometimes byte string), so I can convert it to "standard" unicode (utf-8) later on. I've added this test to the file selection method, where "result" holds the filename the user chose: if isinstance(result, unicode): result = result.encode('iso8859-1') return result later on self.nextfile is set to "result" . The idea was, if I could catch the user's encoding, I could do something like: if isinstance(result, unicode): result = result.encode(sys.stdin.encoding) result = unicode(result, 'utf-8') to avoid problems with unicode objects that have different encodings - or isn't this necessary at all ? I'm sorry if this is a dumb question, but I'm afraid I'm a complete encoding-idiot. Thanks and best regards Michael _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor