Re: urllib2 - 403 that _should_ not occur.

Philip Semanchuk Sun, 11 Jan 2009 19:26:30 -0800


On Jan 11, 2009, at 10:05 PM, James Mills wrote:

On Mon, Jan 12, 2009 at 12:58 PM, Philip Semanchuk <phi...@semanchuk.com> wrote:


On Jan 11, 2009, at 8:59 PM, James Mills wrote:

Hey all,

The following fails for me:

from urllib2 import urlopen
f =
urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml")


Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/urllib2.py", line 124, in urlopen
 return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 389, in open
 response = meth(req, response)
File "/usr/lib/python2.6/urllib2.py", line 502, in http_response
 'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.6/urllib2.py", line 427, in error
 return self._call_chain(*args)
File "/usr/lib/python2.6/urllib2.py", line 361, in _call_chain
 result = func(*args)

File "/usr/lib/python2.6/urllib2.py", line 510, inhttp_error_default

 raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden


However, that _same_ url works perfectly fine on the
same machine (and same network) using any of:
* curl
* wget
* elinks
* firefox

Any helpful ideas ?


The remote server doesn't like your user agent?

It'd be easier to help if you post a working sample.


That was a working sample!

Oooops, I guess it is my brain that's not working, then! Sorry aboutthat.


I tried your sample and got the 403. This works for me:

>>> import urllib2

>>> user_agent="Mozilla/5.001 (windows; U; NT4.0; en-US; rv:1.0)Gecko/25250101">>> url="http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml"

>>> req = urllib2.Request(url, None, { 'User-Agent' : user_agent})
>>> f = urllib2.urlopen(req)
>>> s=f.read()
>>> f.close()
>>> print s
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<rss version="2.0">
  <channel>
  <title>Chromium-Announce Google Group</title>
  <link>http://groups.google.com/group/chromium-announce</link>

<description>This list is intended for important productannouncements that affect the majority of

etc.

Why Google would deny access to services by
unknown User Agents is beyond me - especially
since in most cases User Agents strings are not
strict.

Some sites ban UAs that look like bots. I know there's a Java-basedbot with a distinct UA that was really badly-behaved when visiting myserver. Ignored robots.txt, fetched pages as quickly as it could etc.That was worthy of banning. FWIW, when I try the code above with a UAof "funny fish" it still works OK, so it looks like thegroups.google.com server has it out for UAs with Python in them, notjust unknown ones.

I'm sure that if you changed wget's UA string to something Pythonic itwould start to fail too.



Cheers
Philip
--
http://mail.python.org/mailman/listinfo/python-list

Re: urllib2 - 403 that _should_ not occur.

Reply via email to