urllib2 disable proxy
Hello list, I've been looking for a way to explicitly disable the use of proxies with urllib2, no matter what the environment dictates. Unfortunately I can't find a way in the documentation, and reading the source leads me to believe that something like the following does the job: req.set_proxy(None,None) Where req is a urllib2.Request instance. So is there an official way of doing this? Perhaps it should be added in the documentation? Thanks in advance, Dimitris -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 disable proxy
On Wed, 2 Jan 2008, Rob Wolfe wrote: > Dimitrios Apostolou <[EMAIL PROTECTED]> writes: > >> Hello list, >> >> I've been looking for a way to explicitly disable the use of proxies with >> urllib2, no matter what the environment dictates. Unfortunately I can't find >> a way in the documentation, and reading the source leads me to believe that >> something like the following does the job: >> >> req.set_proxy(None,None) >> >> Where req is a urllib2.Request instance. So is there an official way of doing >> this? Perhaps it should be added in the documentation? > > I believe that the recommended way is to use `urllib2.ProxyHandler`. > Take a look at: > http://www.voidspace.org.uk/python/articles/urllib2.shtml Thanks for the pointer, I will use that way. However it seems rather non-elegant way to do something so simple and I was hoping not to mess with ProxyHandler, especially since I want *no* proxy... IMHO something like the following would be more elegant: req.set_proxy('','http') or req.set_proxy(None,'http') However these ways *don't* work. You think I should file a feature request somewhere or send this to the python-dev list? Thank you for the help, Dimitris > > HTH, > Rob > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list
urllib2 rate limiting
Hello list, I want to limit the download speed when using urllib2. In particular, having several parallel downloads, I want to make sure that their total speed doesn't exceed a maximum value. I can't find a simple way to achieve this. After researching a can try some things but I'm stuck on the details: 1) Can I overload some method in _socket.py to achieve this, and perhaps make this generic enough to work even with other libraries than urllib2? 2) There is the urllib.urlretrieve() function which accepts a reporthook parameter. Perhaps I can have reporthook to increment a global counter and sleep as necessary when a threshold is reached. However there is not something similar in urllib2. Isn't urllib2 supposed to be a superset of urllib in functionality? Why there is no reporthook parameter in any of urllib2's functions? Moreover, even the existing way reporthook can be used doesn't seem so right: reporthook(blocknum, bs, size) is always called with bs=8K even for the last block, and sometimes (blocknum*bs > size) is possible, if the server sends wrong Content-Lentgth HTTP headers. 3) Perhaps I can use filehandle.read(1024) and manually read as many chunks of data as I need. However I think this would generally be inefficient and I'm not sure how it would work because of internal buffering of urllib2. So how do you think I can achieve rate limiting in urllib2? Thanks in advance, Dimitris P.S. And something simpler: How can I disallow urllib2 to follow redirections to foreign hosts? -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 rate limiting
On Thu, 10 Jan 2008, Rob Wolfe wrote: > Dimitrios Apostolou <[EMAIL PROTECTED]> writes: > >> P.S. And something simpler: How can I disallow urllib2 to follow >> redirections to foreign hosts? > > You need to subclass `urllib2.HTTPRedirectHandler`, override > `http_error_301` and `http_error_302` methods and throw > `urllib2.HTTPError` exception. Thanks! I think for my case it's better to override redirect_request method, and return a Request only in case the redirection goes to the same site. Just another question, because I can't find in the docs the meaning of (req, fp, code, msg, hdrs) parameters. To read the URL I get redirected to (the 'Location:' HTTP header?), should I check the hdrs parameter or there is a better way? Thanks, Dimitris > > http://diveintopython.org/http_web_services/redirects.html > > HTH, > Rob > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 rate limiting
On Thursday 10 January 2008 22:42:44 Rob Wolfe wrote: > Dimitrios Apostolou <[EMAIL PROTECTED]> writes: > > On Thu, 10 Jan 2008, Rob Wolfe wrote: > >> Dimitrios Apostolou <[EMAIL PROTECTED]> writes: > >>> P.S. And something simpler: How can I disallow urllib2 to follow > >>> redirections to foreign hosts? > >> > >> You need to subclass `urllib2.HTTPRedirectHandler`, override > >> `http_error_301` and `http_error_302` methods and throw > >> `urllib2.HTTPError` exception. > > > > Thanks! I think for my case it's better to override redirect_request > > method, and return a Request only in case the redirection goes to the > > same site. Just another question, because I can't find in the docs the > > meaning of (req, fp, code, msg, hdrs) parameters. To read the URL I > > get redirected to (the 'Location:' HTTP header?), should I check the > > hdrs parameter or there is a better way? > > Well, according to the documentation there is no better way. > But I looked into the source code of `urllib2` and it seems > that `redirect_request` method takes one more parameter > `newurl`, what is probably what you're looking for. ;) > > Regards, > Rob Cool! :-) Sometimes undocumented features provide superb solutions... I wonder if there is something similar for rate limiting :-s Thank you, Dimitris -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 rate limiting
On Fri, 11 Jan 2008, Nick Craig-Wood wrote: > Here is an implementation based on that idea. I've used urllib rather > than urllib2 as that is what I'm familiar with. Thanks! Really nice implementation. However I'm stuck with urllib2 because of its extra functionality so I'll try to implement something similar using handle.read(1024) to read in small chunks. It really seems weird that urllib2 is missing reporthook functionality! Thank you, Dimitris -- http://mail.python.org/mailman/listinfo/python-list