In article 499f397c.7030...@v.loewis.de,
=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= mar...@v.loewis.de wrote:
Yes, I know that. But every concrete representation of a unicode string
has to have an encoding associated with it, including unicode strings
produced by the Python parser when it
* Martin v. Löwis (Sat, 21 Feb 2009 00:15:08 +0100)
Yes, I know that. But every concrete representation of a unicode
string has to have an encoding associated with it, including unicode
strings produced by the Python parser when it parses the ascii
string u'\xb5'
My question is: what
On Sat, Feb 21, 2009 at 7:24 PM, Thorsten Kampe
thors...@thorstenkampe.de wrote:
I'm pretty much sure it is UCS-2 or UCS-4. (Yes, I know there is only a
slight difference to UTF-16/UTF-32).
I wouldn't call the difference that slight, especially between UTF-16
and UCS-2, since the former can
My question is: what is that encoding?
The internal representation is either UTF-16, or UTF-32; which one is
a compile-time choice (i.e. when the Python interpreter is built).
Wait, I thought it was UCS-2 or UCS-4? Or am I misremembering the
countless threads about the distinction between
I'm pretty much sure it is UCS-2 or UCS-4. (Yes, I know there is only a
slight difference to UTF-16/UTF-32).
I wouldn't call the difference that slight, especially between UTF-16
and UCS-2, since the former can encode all Unicode code points, while
the latter can only encode those in the
On Sat, Feb 21, 2009 at 9:10 PM, Martin v. Löwis mar...@v.loewis.de wrote:
I'm pretty much sure it is UCS-2 or UCS-4. (Yes, I know there is only a
slight difference to UTF-16/UTF-32).
I wouldn't call the difference that slight, especially between UTF-16
and UCS-2, since the former can encode
On Feb 21, 10:48 am, a...@pythoncraft.com (Aahz) wrote:
In article 499f397c.7030...@v.loewis.de,
=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= mar...@v.loewis.de wrote:
Yes, I know that. But every concrete representation of a unicode string
has to have an encoding associated with it,
Indeed. As Python *can* encode all characters even in 2-byte mode
(since PEP 261), it seems clear that Python's Unicode representation
is *not* strictly UCS-2 anymore.
Since we're already discussing this, I'm curious - why was UCS-2
chosen over plain UTF-16 or UTF-8 in the first place for
On Sat, Feb 21, 2009 at 9:45 PM, Martin v. Löwis mar...@v.loewis.de wrote:
Indeed. As Python *can* encode all characters even in 2-byte mode
(since PEP 261), it seems clear that Python's Unicode representation
is *not* strictly UCS-2 anymore.
Since we're already discussing this, I'm curious -
I would have thought that the answer would be: the default encoding
(duh!) But empirically this appears not to be the case:
unicode('\xb5')
Traceback (most recent call last):
File stdin, line 1, in module
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in position 0:
ordinal not in
Ron Garret wrote:
I would have thought that the answer would be: the default encoding
(duh!) But empirically this appears not to be the case:
unicode('\xb5')
Traceback (most recent call last):
File stdin, line 1, in module
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in
Stefan Behnel wrote:
print u'\xb5'
µ
What you
see in the last line is what the Python interpreter makes of your unicode
string when passing it into stdout, which in your case seems to use a
latin-1 encoding (check your environment settings for that).
The seems to is misleading. The
In article 499f18bd$0$31879$9b4e6...@newsspool3.arcor-online.net,
Stefan Behnel stefan...@behnel.de wrote:
Ron Garret wrote:
I would have thought that the answer would be: the default encoding
(duh!) But empirically this appears not to be the case:
unicode('\xb5')
Traceback (most
Ron Garret wrote:
I would have thought that the answer would be: the default encoding
(duh!) But empirically this appears not to be the case:
unicode('\xb5')
Traceback (most recent call last):
File stdin, line 1, in module
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in
Ron Garret rnospa...@flownet.com writes:
Put this another way: I would have thought that when the Python parser
parses u'\xb5' it would produce the same result as calling
unicode('\xb5'), but it doesn't. Instead it seems to produce the same
result as calling unicode('\xb5', 'latin-1'). But my
Yes, I know that. But every concrete representation of a unicode string
has to have an encoding associated with it, including unicode strings
produced by the Python parser when it parses the ascii string u'\xb5'
My question is: what is that encoding?
The internal representation is either
u'\xb5'
u'\xb5'
print u'\xb5'
�
Unicode literals are *in the source file*, which can only have one
encoding (for a given source file).
(That last character shows up as a micron sign despite the fact that
my default encoding is ascii, so it seems to me that that unicode
string must
In article 499f3a8f.9010...@v.loewis.de,
Martin v. Löwis mar...@v.loewis.de wrote:
u'\xb5'
u'\xb5'
print u'\xb5'
?
Unicode literals are *in the source file*, which can only have one
encoding (for a given source file).
(That last character shows up as a micron sign despite the
In article 499f397c.7030...@v.loewis.de,
Martin v. Löwis mar...@v.loewis.de wrote:
Yes, I know that. But every concrete representation of a unicode string
has to have an encoding associated with it, including unicode strings
produced by the Python parser when it parses the ascii string
Martin v. Löwis wrote:
mehow have picked up a latin-1 encoding.)
I think latin-1 was the default without a coding cookie line. (May be
uft-8 in 3.0).
It is, but that's irrelevant for the example. In the source
u'\xb5'
all characters are ASCII (i.e. all of letter u, single
quote,
20 matches
Mail list logo