On Aug 9, 9:41 am, Steven D'Aprano <st...@remove-this- cybersource.com.au> wrote: > On Sun, 09 Aug 2009 06:13:38 -0700,samwysewrote: > > Here's what I have so far: > > > import urllib > > > class AppURLopener(urllib.FancyURLopener): > > version = "App/1.7" > > referrer = None > > def __init__(self, *args): > > urllib.FancyURLopener.__init__(self, *args) > > if self.referrer: > > addheader('Referer', self.referrer) > > > urllib._urlopener = AppURLopener() > > > Unfortunately, the 'Referer' header potentially varies for each url that > > I retrieve, and the way the module is written, I can't change the calls > > to __init__ or open. The best idea I've had is to assign a new value to > > my class variable just before calling urllib.urlretrieve(), but that > > just seems ugly. Any ideas? Thanks. > > [Aside: an int variable is an int. A str variable is a str. A list > variable is a list. A class variable is a class. You probably mean a > class attribute, not a variable. If other languages want to call it a > variable, or a sausage, that's their problem.] > > If you're prepared for a bit of extra work, you could take over all the > URL handling instead of relying on automatic openers. This will give you > much finer control, but it will also require more effort on your part. > The basic idea is, instead of installing openers, and then ask the urllib > module to handle the connection, you handle the connection yourself: > > make a Request object using urllib2.Request > make an Opener object using urllib2.build_opener > call opener.open(request) to connect to the server > deal with the connection (retry, fail or read) > > Essentially, you use the Request object instead of a URL, and you would > add the appropriate referer header to the Request object. > > Another approach, perhaps a more minimal change than the above, would be > something like this: > > # untested > class AppURLopener(urllib.FancyURLopener): > version = "App/1.7" > def __init__(self, *args): > urllib.FancyURLopener.__init__(self, *args) > def add_referrer(self, url=None): > if url: > addheader('Referer', url) > > urllib._urlopener = AppURLopener() > urllib._urlopener.add_referrer("http://example.com/")
Thanks for the ideas. I'd briefly considered something similar to your first idea, implementing my own version of urlretrieve to accept a Request object, but it does seem like a good bit of work. Maybe over Labor Day. :) The second idea is pretty much what I'm going to go with for now. The program that I'm writing is almost a clone of wget, but it fixes some personal dislikes with the way recursive retrievals are done. (Or maybe I just don't understand wget's full array of options well enough.) This means that my referrer changes as I bounce up and down the hierarchy, which makes this less convenient. Still, it does seem more convenient that re-writing the module from scratch. -- http://mail.python.org/mailman/listinfo/python-list