Re: urlopen returns forbidden

Grant Edwards Mon, 28 Feb 2011 07:28:14 -0800

On 2011-02-28, Chris Rebert <c...@rebertia.com> wrote:
> On Sun, Feb 27, 2011 at 9:38 PM, monkeys paw <mon...@joemoney.net> wrote:
>> I have a working urlopen routine which opens
>> a url, parses it for <a> tags and prints out
>> the links in the page. On some sites, wikipedia for
>> instance, i get a
>>
>> HTTP error 403, forbidden.
>>
>> What is the difference in accessing the site through a web browser
>> and opening/reading the URL with python urllib2.urlopen?
>
> The User-Agent header (http://en.wikipedia.org/wiki/User_agent ).


Sometimes you also need to set the Referrer header for pages that
don't allow direct-linking from "outside".

As somebody else has already said, if the site provides an API that
they want you to use you should do so rather than hammering their web
server with a screen-scraper.

Not only is is a lot less load on the site, it's usually a lot easier.

-- 
Grant Edwards               grant.b.edwards        Yow! Look DEEP into the
                                  at               OPENINGS!!  Do you see any
                              gmail.com            ELVES or EDSELS ... or a
                                                   HIGHBALL?? ...
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urlopen returns forbidden

Reply via email to