Tim Arnold wrote:
> "?? ???" wrote in message
> news:ciqh56-ses@archaeopteryx.softver.org.mk...
>> So, I'm using lxml to screen scrap a site that uses the cyrillic
>> alphabet (windows-1251 encoding). The sites HTML doesn't have the > ..content-type.. charset=..> header, but does
"?? ???" wrote in message
news:ciqh56-ses@archaeopteryx.softver.org.mk...
> So, I'm using lxml to screen scrap a site that uses the cyrillic
> alphabet (windows-1251 encoding). The sites HTML doesn't have the ..content-type.. charset=..> header, but does have a HTTP header that
>
So, I'm using lxml to screen scrap a site that uses the cyrillic
alphabet (windows-1251 encoding). The sites HTML doesn't have the header, but does have a HTTP header that
specifies the charset... so they are standards compliant enough.
Now when I run this code:
from lxml import html
doc = htm