On Wed, 23 Feb 2005 23:16:20 -0500
Kent Johnson <[EMAIL PROTECTED]> wrote:

> How about
>    n = self.nextfile
>    if not isinstance(n, unicode):
>      n = unicode(n, 'iso8859-1')
> ?
> 
> > At least this might explain why "A\xe4" worked and "\xe4" not as I 
> > mentioned in a previous post.
> > Now the problem arises how to determine if self.nextfile is unicode or a 
> > byte string?
> > Or maybe even better, make sure that self.nextfile is always a byte string 
> > so I can safely convert
> > it to unicode later on. But how to convert unicode user input into byte 
> > strings when I don't even
> > know the user's encoding ? I guess this will require some further research.
> 
> Why do you need to convert back to byte strings?
> 
> You can find out the console encoding from sys.stdin and stdout:
>   >>> import sys
>   >>> sys.stdout.encoding
> 'cp437'
>   >>> sys.stdin.encoding
> 'cp437'
> 

I *thought* I would have to convert the user input which might be any encoding 
back into
byte string first (remember, I got heavily confused, because user input was 
sometimes unicode and
sometimes byte string), so I can convert it to "standard" unicode (utf-8) later 
on.
I've added this test to the file selection method, where "result" holds the 
filename the user chose:

    if isinstance(result, unicode):
        result = result.encode('iso8859-1')
    return result

later on self.nextfile is set to "result" .

The idea was, if I could catch the user's encoding, I could do something like:

    if isinstance(result, unicode):
        result = result.encode(sys.stdin.encoding)
    result = unicode(result, 'utf-8')

to avoid problems with unicode objects that have different encodings - or isn't 
this necessary at all ?

I'm sorry if this is a dumb question, but I'm afraid I'm a complete 
encoding-idiot.

Thanks and best regards

Michael




_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to