Re: [Tutor] trying to convert pycurl/html to ascii

Cameron Simpson Sun, 29 Mar 2015 19:27:58 -0700

On 29Mar2015 21:49, bruce <badoug...@gmail.com> wrote:

Doing a quick/basic pycurl test on a site and trying to convert the
returned page to pure ascii.


And if the page cannot be representing in ASCII?

The page has the encoding line
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
The test uses pycurl, and the StringIO to fetch the page into a str.

Which StringIO? StringIO.StringIO or io.StringIO? In Python 2 the format iseffectively bytes (python 2 str) and the latter is unicode (as it is in python3).

pycurl stuff
foo=gg.getBuffer()
-at this point, foo has the page in a str buffer.
What's happening, is that the test is getting the following kind of error/
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 20:
invalid start byte

Please show us more of the code, preferrably a complete example as small aspossible to reproduce the exception. We have no idea what "gg" is or how it wasobtained.

The test is using python 2.6 on redhat.
I've tried different decode functions based on different
sites/articles/stackoverflow but can't quite seem to resolve the issue.


Flailing about on stackoverflow sounds a bit random.

Have you consulted the PycURL documentation, especially this page:

 http://pycurl.sourceforge.net/doc/unicode.html

which looks like it ought to discuss your problem.

Cheers,
Cameron Simpson <c...@zip.com.au>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] trying to convert pycurl/html to ascii

Reply via email to