On Jan 11, 2009, at 10:05 PM, James Mills wrote:
On Mon, Jan 12, 2009 at 12:58 PM, Philip Semanchuk <phi...@semanchuk.com
> wrote:
On Jan 11, 2009, at 8:59 PM, James Mills wrote:
Hey all,
The following fails for me:
from urllib2 import urlopen
f =
urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml
")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/urllib2.py", line 124, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 389, in open
response = meth(req, response)
File "/usr/lib/python2.6/urllib2.py", line 502, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.6/urllib2.py", line 427, in error
return self._call_chain(*args)
File "/usr/lib/python2.6/urllib2.py", line 361, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 510, in
http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
However, that _same_ url works perfectly fine on the
same machine (and same network) using any of:
* curl
* wget
* elinks
* firefox
Any helpful ideas ?
The remote server doesn't like your user agent?
It'd be easier to help if you post a working sample.
That was a working sample!
Oooops, I guess it is my brain that's not working, then! Sorry about
that.
I tried your sample and got the 403. This works for me:
>>> import urllib2
>>> user_agent="Mozilla/5.001 (windows; U; NT4.0; en-US; rv:1.0)
Gecko/25250101"
>>> url="http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml
"
>>> req = urllib2.Request(url, None, { 'User-Agent' : user_agent})
>>> f = urllib2.urlopen(req)
>>> s=f.read()
>>> f.close()
>>> print s
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<rss version="2.0">
<channel>
<title>Chromium-Announce Google Group</title>
<link>http://groups.google.com/group/chromium-announce</link>
<description>This list is intended for important product
announcements that affect the majority of
etc.
Why Google would deny access to services by
unknown User Agents is beyond me - especially
since in most cases User Agents strings are not
strict.
Some sites ban UAs that look like bots. I know there's a Java-based
bot with a distinct UA that was really badly-behaved when visiting my
server. Ignored robots.txt, fetched pages as quickly as it could etc.
That was worthy of banning. FWIW, when I try the code above with a UA
of "funny fish" it still works OK, so it looks like the
groups.google.com server has it out for UAs with Python in them, not
just unknown ones.
I'm sure that if you changed wget's UA string to something Pythonic it
would start to fail too.
Cheers
Philip
--
http://mail.python.org/mailman/listinfo/python-list