Re: Unicode/ascii encoding nightmare

2006-11-07 Thread Andrea Griffini
John Machin wrote: Indeed yourself. What does the above mean ? Have you ever considered reading posts in chronological order, or reading all posts in a thread? I do no think people read posts in chronological order; it simply doesn't make sense. I also don't think many do read threads

Re: Unicode/ascii encoding nightmare

2006-11-07 Thread Paul Boddie
Thomas W wrote: Ok, I've cleaned up my code abit and it seems as if I've encoded/decoded myself into a corner ;-). Yes, you may encounter situations where you have some string, you decode it (ie. convert it to Unicode) using one character encoding, but then you later encode it (ie. convert it

Re: Unicode/ascii encoding nightmare

2006-11-07 Thread Cliff Wells
On Tue, 2006-11-07 at 08:10 +0200, Hendrik van Rooyen wrote: John Machin [EMAIL PROTECTED] wrote: 8--- I strongly suggest that you read the docs *FIRST*, and don't tinker at all. This is *good* advice - its unlikely to be followed though, as the

Re: Unicode/ascii encoding nightmare

2006-11-07 Thread Cliff Wells
On Mon, 2006-11-06 at 15:47 -0800, John Machin wrote: Gabriel Genellina wrote: At Monday 6/11/2006 20:34, Robert Kern wrote: John Machin wrote: Indeed yourself. Have you ever considered reading posts in chronological order, or reading all posts in a thread? That presumes that

Unicode/ascii encoding nightmare

2006-11-06 Thread Thomas W
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word fødselsdag. s = 'f\xc3\x83\xc2\xb8dselsdag' I stored the string as fødselsdag but somewhere in my code it got translated into the mess above and I cannot

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Mark Peters
The string below is the encoding of the norwegian word fødselsdag. s = 'f\xc3\x83\xc2\xb8dselsdag' I'm not sure which encoding method you used to get the string above. Here's the result of my playing with the string in IDLE: u1 = u'fødselsdag' u1 u'f\xf8dselsdag' s1 = u1.encode('utf-8')

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Robert Kern
Thomas W wrote: I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word fødselsdag. s = 'f\xc3\x83\xc2\xb8dselsdag' I stored the string as fødselsdag but somewhere in my code it got translated into the

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Thomas W wrote: I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word fødselsdag. s = 'f\xc3\x83\xc2\xb8dselsdag' There is no such thing as *the* encoding of any given string. I stored the string as

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Robert Kern wrote: However, I don't know of an encoding that takes ufødselsdag to 'f\xc3\x83\xc2\xb8dselsdag'. There isn't one. C3 and C2 hint at UTF-8. The fact that C3 and C2 are both present, plus the fact that one non-ASCII byte has morphoploded into 4 bytes indicate a double whammy.

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Andrea Griffini
John Machin wrote: The fact that C3 and C2 are both present, plus the fact that one non-ASCII byte has morphoploded into 4 bytes indicate a double whammy. Indeed... x = ufødselsdag x.encode('utf-8').decode('iso-8859-1').encode('utf-8') 'f\xc3\x83\xc2\xb8dselsdag' Andrea --

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Georg Brandl
Thomas W wrote: I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word fødselsdag. s = 'f\xc3\x83\xc2\xb8dselsdag' Which encoding is this? I stored the string as fødselsdag but somewhere in my code it

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Thomas W
Ok, I've cleaned up my code abit and it seems as if I've encoded/decoded myself into a corner ;-). My understanding of unicode has room for improvement, that's for sure. I got some pointers and initial code-cleanup seem to have removed some of the strange results I got, which several of you also

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Thomas W wrote: Ok, I've cleaned up my code abit and it seems as if I've encoded/decoded myself into a corner ;-). My understanding of unicode has room for improvement, that's for sure. I got some pointers and initial code-cleanup seem to have removed some of the strange results I got, which

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Andrea Griffini wrote: John Machin wrote: The fact that C3 and C2 are both present, plus the fact that one non-ASCII byte has morphoploded into 4 bytes indicate a double whammy. Indeed... x = ufødselsdag x.encode('utf-8').decode('iso-8859-1').encode('utf-8')

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Robert Kern
John Machin wrote: Indeed yourself. Have you ever considered reading posts in chronological order, or reading all posts in a thread? That presumes that messages arrive in chronological order and transmissions are instantaneous. Neither are true. -- Robert Kern I have come to believe that the

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Gabriel Genellina
At Monday 6/11/2006 20:34, Robert Kern wrote: John Machin wrote: Indeed yourself. Have you ever considered reading posts in chronological order, or reading all posts in a thread? That presumes that messages arrive in chronological order and transmissions are instantaneous. Neither are

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Gabriel Genellina wrote: At Monday 6/11/2006 20:34, Robert Kern wrote: John Machin wrote: Indeed yourself. Have you ever considered reading posts in chronological order, or reading all posts in a thread? That presumes that messages arrive in chronological order and transmissions are

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Cameron Laird
In article [EMAIL PROTECTED], John Machin [EMAIL PROTECTED] wrote: Thomas W wrote: Ok, I've cleaned up my code abit and it seems as if I've encoded/decoded myself into a corner ;-). My understanding of unicode has room for improvement, that's for sure. I got some pointers and initial

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread John Machin
Cameron Laird wrote: In article [EMAIL PROTECTED], John Machin [EMAIL PROTECTED] wrote: Thomas W wrote: Ok, I've cleaned up my code abit and it seems as if I've encoded/decoded myself into a corner ;-). My understanding of unicode has room for improvement, that's for sure. I got some

Re: Unicode/ascii encoding nightmare

2006-11-06 Thread Hendrik van Rooyen
John Machin [EMAIL PROTECTED] wrote: 8--- I strongly suggest that you read the docs *FIRST*, and don't tinker at all. HTH, John This is *good* advice - its unlikely to be followed though, as the OP is prolly just like most of us - you unpack the stuff