Errrrrrrr, it get's worse: not only is the title written in Chinese, it is encoded as gb2312 -- here is the repr() of the first few chunks:
"<html>\n<head>\n <title>\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) : \xc4\xd a\xb2\xbf\xc8\xcb\xd4\xb1\xb3\xd6\xb9\xc9 - \xcb\xd1\xba\xfc\xb9\xc9\xc6\xb1</ti tle>\n<meta http-equiv='Content-Type' content='text/html; charset=gb2312'>\n" and here is what you get after that_guff.decode('gb2312') u"<html>\n<head>\n <title>\u4e2d\u56fd\u77f3\u5316(600028) : \u5185\u90e8\u 4eba\u5458\u6301\u80a1 - \u641c\u72d0\u80a1\u7968</title>\n<meta http-equiv='Con tent-Type' content='text/html; charset=gb2312'>\n" The first 2 characters of the title are recognisable both visually on the browser title and in the unicode as "zhong guo" i.e. China. BUT the OP's first message is interpreting that gb2312-encoded stuff as Unicode: s1 = u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028) ' *SOMEBODY* is seriously deluded, and it ain't me, and it ain't Serge :-) ... and yes Peter, info travels faster also from China that it does from Armenia :-()) -- http://mail.python.org/mailman/listinfo/python-list