Hi all I'm trying to use python to automatically download and process a (small) number of wikipedia articles. However, I keep getting a 403 (Forbidden Error), when using urllib2:
>>> import urllib2 >>> ip = urllib2.urlopen("http://en.wikipedia.org/wiki/Pythonidae") which gives this: Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> ip = urllib2.urlopen("http://en.wikipedia.org/wiki/Pythonidae") File "G:\Python25\lib\urllib2.py", line 121, in urlopen return _opener.open(url, data) File "G:\Python25\lib\urllib2.py", line 380, in open response = meth(req, response) File "G:\Python25\lib\urllib2.py", line 491, in http_response 'http', request, response, code, msg, hdrs) File "G:\Python25\lib\urllib2.py", line 418, in error return self._call_chain(*args) File "G:\Python25\lib\urllib2.py", line 353, in _call_chain result = func(*args) File "G:\Python25\lib\urllib2.py", line 499, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 403: Forbidden Now, when I use urllib instead of urllib2, something different happens: >>> import urllib >>> ip2 = urllib.urlopen("http://en.wikipedia.org/wiki/Pythonidae") >>> st = ip2.read() However, st does not contain the hoped-for page - instead it is a page of html and (maybe?) javascript, which ends in: >If reporting this error to the Wikimedia System Administrators, please include the following >details:<br/>\n<span style="font-style: >italic">\nRequest: GET http://en.wikipedia.org/wiki>/Pythonidae<http://en.wikipedia.org/wiki/Pythonidae>, from 98.195.188.89 via sq27.wikimedia.org (squid/2.6.STABLE13) >to >()<br/>\nError: ERR_ACCESS_DENIED, errno [No Error] at Sat, 27 Oct 2007 06:45:00 >GMT\n</span>\n</div>\n\n</body>\n</html>\n' Could anybody tell me what's going on, and what I should be doing differently? Thanks for your time Alex
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor