On 2011-02-28, Chris Rebert <c...@rebertia.com> wrote: > On Sun, Feb 27, 2011 at 9:38 PM, monkeys paw <mon...@joemoney.net> wrote: >> I have a working urlopen routine which opens >> a url, parses it for <a> tags and prints out >> the links in the page. On some sites, wikipedia for >> instance, i get a >> >> HTTP error 403, forbidden. >> >> What is the difference in accessing the site through a web browser >> and opening/reading the URL with python urllib2.urlopen? > > The User-Agent header (http://en.wikipedia.org/wiki/User_agent ).
Sometimes you also need to set the Referrer header for pages that don't allow direct-linking from "outside". As somebody else has already said, if the site provides an API that they want you to use you should do so rather than hammering their web server with a screen-scraper. Not only is is a lot less load on the site, it's usually a lot easier. -- Grant Edwards grant.b.edwards Yow! Look DEEP into the at OPENINGS!! Do you see any gmail.com ELVES or EDSELS ... or a HIGHBALL?? ... -- http://mail.python.org/mailman/listinfo/python-list