"Alex Ryu" <[EMAIL PROTECTED]> wrote > I'm trying to use python to automatically download and process a > (small) > number of wikipedia articles. However, I keep getting a 403 > (Forbidden > Error), when using urllib2:
FWIW I had a similar problem in trying to use Google to illustrate the use of urlib2 in my tutorial. It seems some wevb sites implement measures to prevent robotic access. I assume you could spoof your browsers characteristics and fool the system but I tend to take the view that if the owner doesn't like robots then I'd better respect that, so I haven't tried. All of which reminds me that I really need to finish writing that topic! :-) > File "G:\Python25\lib\urllib2.py", line 499, in http_error_default > raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) > HTTPError: HTTP Error 403: Forbidden > > Now, when I use urllib instead of urllib2, something different > happens: > > from 98.195.188.89 via sq27.wikimedia.org (squid/2.6.STABLE13) >to >>()<br/>\nError: ERR_ACCESS_DENIED, errno [No Error] at Sat, 27 Oct >>2007 HTH, -- Alan Gauld Author of the Learn to Program web site http://www.freenetpages.co.uk/hp/alan.gauld _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor