Re: Unicode characters in btye-strings

2010-03-12 Thread Martin v. Loewis
Michael Rudolf wrote: > Am 12.03.2010 21:56, schrieb Martin v. Loewis: >> (*) If a source encoding was given, the source is actually recoded to >> UTF-8, parsed, and then re-encoded back into the original encoding. > > Why is that? Why is what? That string literals get reencoded into the source e

Re: Unicode characters in btye-strings

2010-03-12 Thread John Bokma
Michael Rudolf writes: > Am 12.03.2010 21:56, schrieb Martin v. Loewis: >> (*) If a source encoding was given, the source is actually recoded to >> UTF-8, parsed, and then re-encoded back into the original encoding. > > Why is that? So "unicode"-strings (as in u"string") are not really > unicode-

Re: Unicode characters in btye-strings

2010-03-12 Thread Michael Rudolf
Am 12.03.2010 21:56, schrieb Martin v. Loewis: (*) If a source encoding was given, the source is actually recoded to UTF-8, parsed, and then re-encoded back into the original encoding. Why is that? So "unicode"-strings (as in u"string") are not really unicode-, but utf8-strings? Need citatio

Re: Unicode characters in btye-strings

2010-03-12 Thread Martin v. Loewis
>> Can somebody explain what happens when I put non-ASCII characters into a >> non-unicode string? My guess is that the result will depend on the >> current encoding of my terminal. > > Exactly right. To elaborate on the "what happens" part: the string that gets entered is typically passed as a b

Re: Unicode characters in btye-strings

2010-03-12 Thread Robert Kern
On 2010-03-12 06:35 AM, Steven D'Aprano wrote: I know this is wrong, but I'm not sure just how wrong it is, or why. Using Python 2.x: s = "éâÄ" print s éâÄ len(s) 6 list(s) ['\xc3', '\xa9', '\xc3', '\xa2', '\xc3', '\x84'] Can somebody explain what happens when I put non-ASCII characters i

Unicode characters in btye-strings

2010-03-12 Thread Steven D'Aprano
I know this is wrong, but I'm not sure just how wrong it is, or why. Using Python 2.x: >>> s = "éâÄ" >>> print s éâÄ >>> len(s) 6 >>> list(s) ['\xc3', '\xa9', '\xc3', '\xa2', '\xc3', '\x84'] Can somebody explain what happens when I put non-ASCII characters into a non-unicode string? My guess is