On 03/29/2015 09:49 PM, bruce wrote:
Hi.

Doing a quick/basic pycurl test on a site and trying to convert the
returned page to pure ascii.

You cannot convert it to pure ASCII. You could replace all the invalid characters with some special one, like question marks. But I doubt if that's what you really want.


The page has the encoding line

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

That would mean you should use 8859 in your decode.


The test uses pycurl, and the StringIO to fetch the page into a str.

pycurl stuff
.
.
.
foo=gg.getBuffer()

-at this point, foo has the page in a str buffer.


What's happening, is that the test is getting the following kind of error/

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 20:
invalid start byte

That's not the whole error. You need to show the whole stack trace, not just a single line. It would also be really useful if you showed the lines between the foo= line and the one that gets the error.



The test is using python 2.6 on redhat.

Very good to tell us that.  It makes a huge difference.

I've tried different decode functions based on different
sites/articles/stackoverflow but can't quite seem to resolve the issue.


Pick one, show us the code, and show us the full error traceback, and somebody can help. As it stands all I can tell us is a decode takes a byte string and an encoding name, and produces a unicode object. And it's not going to give you a utf-8 error if you're trying to decode 8859.

--
DaveA
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to