Re: [Tutor] trying to convert pycurl/html to ascii

Dave Angel Sun, 29 Mar 2015 19:10:50 -0700

On 03/29/2015 09:49 PM, bruce wrote:

Hi.


Doing a quick/basic pycurl test on a site and trying to convert the
returned page to pure ascii.

You cannot convert it to pure ASCII. You could replace all the invalidcharacters with some special one, like question marks. But I doubt ifthat's what you really want.


The page has the encoding line

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">


That would mean you should use 8859 in your decode.


The test uses pycurl, and the StringIO to fetch the page into a str.

pycurl stuff
.
.
.
foo=gg.getBuffer()

-at this point, foo has the page in a str buffer.


What's happening, is that the test is getting the following kind of error/

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 20:
invalid start byte

That's not the whole error. You need to show the whole stack trace, notjust a single line. It would also be really useful if you showed thelines between the foo= line and the one that gets the error.


The test is using python 2.6 on redhat.

Very good to tell us that.  It makes a huge difference.

I've tried different decode functions based on different
sites/articles/stackoverflow but can't quite seem to resolve the issue.

Pick one, show us the code, and show us the full error traceback, andsomebody can help. As it stands all I can tell us is a decode takes abyte string and an encoding name, and produces a unicode object. Andit's not going to give you a utf-8 error if you're trying to decode 8859.


--
DaveA
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] trying to convert pycurl/html to ascii

Reply via email to