On Mon, Jan 12, 2009 at 1:25 PM, Philip Semanchuk <phi...@semanchuk.com> wrote: > Oooops, I guess it is my brain that's not working, then! Sorry about that.
Nps. > I tried your sample and got the 403. This works for me: (...) > Some sites ban UAs that look like bots. I know there's a Java-based bot with > a distinct UA that was really badly-behaved when visiting my server. Ignored > robots.txt, fetched pages as quickly as it could etc. That was worthy of > banning. FWIW, when I try the code above with a UA of "funny fish" it still > works OK, so it looks like the groups.google.com server has it out for UAs > with Python in them, not just unknown ones. > > I'm sure that if you changed wget's UA string to something Pythonic it would > start to fail too. My problem that I'm solving and my use-case is a tool to periodically check configured RSS feeds for updates. I was going to use urllib2 to get the data and pass this off to feedparser.parse(...) Because of the UA problem though (which can be overcome) - I decided to try a different approach and use feedparse entirely (which uses urllib internally). Problem is, feedparser doesn't store the http's response content anywhere - only the parsed results - *sigh*. My solution now is to parse and store the data I required in a simple object and pickle this to a set of cached files and compare this against hashed versions of the content. cheers James -- http://mail.python.org/mailman/listinfo/python-list