kj wrote:
Some people have mathphobia. I'm developing a wicked case of
Unicodephobia.
I have read a *ton* of stuff on Unicode. It doesn't even seem all
that hard. Or so I think. Then I start writing code, and WHAM:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordi
kj wrote:
=A0 x =3D '%s' % y
=A0 x =3D '%s' % z
=A0 print y
=A0 print z
=A0 print y, z
Bear in mind that most Python implementations assume the "console"
only handles ASCII. So "print" output is converted to ASCII, which
can fail. (Actually, all modern Windows and Linux systems support
Un
On Wed, 10 Feb 2010 12:17:51 -0800, Anthony Tolle wrote:
> 4. Consider switching to Python 3.x, since there is only one string
> type (unicode).
However: one drawback of Python 3.x is that the repr() of a Unicode string
is no longer restricted to ASCII. There is an ascii() function which
behaves
On 2/11/2010 4:43 PM, mk wrote:
Neat, except that the process of porting most projects and external
libraries to P3 seems to be, how should I put it, standing still?
What is important are the libraries, so more new projects can start in
3.x. There is a slow trickly of 3.x support announcement
mk wrote:
> MRAB wrote:
>
>> When working with Unicode in Python 2, you should use the 'unicode' type
>> for text (Unicode strings) and limit the 'str' type to binary data
>> (bytestrings, ie bytes) only.
>
> Well OK, always use u'something', that's simple -- but isn't str what I
> get from files
On 2010-02-11 15:43 PM, mk wrote:
MRAB wrote:
Strictly speaking, only Unicode can be encoded.
How so? Can't bytestrings containing characters of, say, koi8r encoding
be encoded?
I think he means that only unicode objects can be encoded using the .encode()
method, as clarified by his next
MRAB wrote:
When working with Unicode in Python 2, you should use the 'unicode' type
for text (Unicode strings) and limit the 'str' type to binary data
(bytestrings, ie bytes) only.
Well OK, always use u'something', that's simple -- but isn't str what I
get from files and sockets and the like
mk wrote:
kj wrote:
I have read a *ton* of stuff on Unicode. It doesn't even seem all
that hard. Or so I think. Then I start writing code, and WHAM:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
0: ordinal not in range(128)
(There, see? My Unicodephobia just went u
In mk
writes:
>To make matters more complicated, str.encode() internally DECODES from
>string into unicode:
> >>> nu
>'\xc4\x84'
> >>>
> >>> type(nu)
>
> >>> nu.encode()
>Traceback (most recent call last):
> File "", line 1, in
>UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in
kj wrote:
I have read a *ton* of stuff on Unicode. It doesn't even seem all
that hard. Or so I think. Then I start writing code, and WHAM:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal
not in range(128)
(There, see? My Unicodephobia just went up a notch.)
In Duncan Booth
writes:
>kj wrote:
>> But to ground
>> the problem a bit I'll say that the exception above happens during
>> the execution of a statement of the form:
>>
>> x = '%s %s' % (y, z)
>>
>> Also, I found that, with the exact same values y and z as above,
>> all of the following
On Wed, 2010-02-10 at 12:17 -0800, Anthony Tolle wrote:
> On Feb 10, 2:09 pm, kj wrote:
> > Some people have mathphobia. I'm developing a wicked case of
> > Unicodephobia.
> > [snip]
>
> Some general advice (Looks like I am reiterating what MRAB said -- I
> type slower :):
>
> 1. If possible, u
On Wed, Feb 10, 2010 at 1:03 PM, kj wrote:
> In <402ac982-0750-4977-adb2-602b19149...@m24g2000prn.googlegroups.com>
Jonathan Gardner writes:
>>It sounds like someone, probably beautiful soup, is trying to turn
>>your strings into unicode. A full stacktrace would be useful to see
>>who did what w
On Wed, Feb 10, 2010 at 1:03 PM, kj wrote:
> >What are y and z?
>
> x = "%s %s" % (table['id'], table.tr.renderContents())
>
> where the variable table represents a BeautifulSoup.Tag instance.
>
> >Are they unicode or strings?
>
> The first item (table['id']) is unicode, and the second is str.
In <402ac982-0750-4977-adb2-602b19149...@m24g2000prn.googlegroups.com> Jonathan
Gardner writes:
>On Feb 10, 11:09=A0am, kj wrote:
>> FWIW, I'm using Python 2.6. =A0The example above happens to come from
>> a script that extracts data from HTML files, which are all in
>> English, but they are a
On Feb 10, 2:09 pm, kj wrote:
> Some people have mathphobia. I'm developing a wicked case of
> Unicodephobia.
> [snip]
Some general advice (Looks like I am reiterating what MRAB said -- I
type slower :):
1. If possible, use unicode strings for everything. That is, don't
use both str and unicod
kj wrote:
Some people have mathphobia. I'm developing a wicked case of
Unicodephobia.
I have read a *ton* of stuff on Unicode. It doesn't even seem all
that hard. Or so I think. Then I start writing code, and WHAM:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ord
kj wrote:
> But to ground
> the problem a bit I'll say that the exception above happens during
> the execution of a statement of the form:
>
> x = '%s %s' % (y, z)
>
> Also, I found that, with the exact same values y and z as above,
> all of the following statements work perfectly fine:
>
>
On Feb 10, 11:09 am, kj wrote:
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
> ordinal not in range(128)
>
You'll have to understand some terminology first.
"codec" is a description of how to encode and decode unicode data to a
stream of bytes.
"decode" means you
Some people have mathphobia. I'm developing a wicked case of
Unicodephobia.
I have read a *ton* of stuff on Unicode. It doesn't even seem all
that hard. Or so I think. Then I start writing code, and WHAM:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal
not i
20 matches
Mail list logo