Re: Can't get the real contents form page in internet as the tag "no-chche"

Diez B. Roggisch Thu, 23 Mar 2006 02:55:44 -0800

dongdong wrote:

> using web browser can get page's content formally, but when use
>
urllib2.open("http://tech.163.com/2004w11/12732/2004w11_1100059465339.html";).read()
> 
> the result is
> 
> <html><head><META HTTP-EQUIV=REFRESH
> CONTENT="0;URL=http://tech.163.com/04/1110/12/14QUR2BR0009159H.html";>
> <META http-equiv="Pragma"
> content="no-cache"></HEAD><body>?y?ú'ò?aò3??...</body></html>
> 
> ,I think the reson is the no-cache, are there person would help me?


No, the reason is the <META HTTP-EQUIV=REFRESH
CONTENT="0;URL=http://tech.163.com/04/1110/12/14QUR2BR0009159H.html";>

that redirects you to the real site. Extract that url from the page and
request that. Or maybe you can use webunit, which acts more like a "real"
http-client with interpreting such content.

diez
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Can't get the real contents form page in internet as the tag "no-chche"

Reply via email to