I am trying to write a fix for this bug http://bugs.python.org/issue2464 - urllib2 can't handle http://www.wikispaces.com
What actually happening here is: 1) urllib2 tries to open http://www.wikispaces.com 2) It gets 302 Redirected to https://session.wikispaces.com/session/auth?authToken=1bd8784307f89a495cc1aafb075c4983 3) It again gets 302 Redirected to: 'http://www.wikispaces.com?responseToken=1bd8784307f89a495cc1aafb075c4983 After this, gets a 200 code, but when the page it retrived it 400 Bad Request! Firefox has NO problem in getting the actual page though. Here is the O/P of the session (I have made print header.items() at http_error_302 method in HTTPRedirectHandler): >>> obj1 = urllib2.urlopen("http://www.wikispaces.com") [('content-length', '0'), ('x-whom', 'w9-prod-http, p1'), ('set-cookie', 'slave=1; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/, test=1; expires=Wed, 13-Aug-2008 13:03:51 GMT; path=/'), ('server', 'nginx/0.6.30'), ('connection', 'close'), ('location', 'https://session.wikispaces.com/session/auth?authToken=4b3eecb5c1ab301689e446cf03b3a585'), ('date', 'Wed, 13 Aug 2008 12:33:51 GMT'), ('p3p', 'CP: ALL DSP COR CURa ADMa DEVa CONo OUR IND ONL COM NAV INT CNT STA'), ('content-type', 'text/html; charset=utf-8')] [('content-length', '0'), ('x-whom', 'w8-prod-https, p1'), ('set-cookie', 'master=1; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/, master=7de5d46e15fd23b1ddf782c565d4fb3a; expires=Thu, 14-Aug-2008 13:03:53 GMT; path=/; domain=session.wikispaces.com'), ('server', 'nginx/0.6.30'), ('connection', 'close'), ('location', 'http://www.wikispaces.com?responseToken=4b3eecb5c1ab301689e446cf03b3a585'), ('date', 'Wed, 13 Aug 2008 12:33:53 GMT'), ('p3p', 'CP: ALL DSP COR CURa ADMa DEVa CONo OUR IND ONL COM NAV INT CNT STA'), ('content-type', 'text/html; charset=utf-8')] >>> print obj1.geturl() http://www.wikispaces.com?responseToken=4b3eecb5c1ab301689e446cf03b3a585 >>> print obj1.code 200 >>> print obj1.headers >>> print obj1.info() >>> print obj1.read() <html> <head><title>400 Bad Request</title></head> <body bgcolor="white"> <center><h1>400 Bad Request</h1></center> <hr><center>nginx/0.6.30</center> </body> </html> With all this happening with urllib2, firefox is able to handle this properly. Also I notice that I suffix the url with a dummy path say url = "http://www.wikispaces.com/dummy_url_path". The urlopen request will still to through 302-302-200. but with dummy_url_path appended in the redirections and then read() will succeed! Please share your opinion on where do you think, that urllib2 is going wrong here! I am not able to drill down to the fault point. This has NOT got to do with null characters in the redirection url as noted in the bug report. Thanks, Senthil _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com