[email protected] wrote: > On 01:20 pm, [email protected] wrote: >> Hello! >> >> I am a newbie in twisted, sorry if my question sounds awkward. >> >> I have written a pretty simple recursive page downloader, which parses >> an html, extracts all the needed links from it, and starts dowloading >> them. The links are the videofiles, so they are pretty large. The >> problem is, that the downloader works TOO FAST :) I want to set >> something like the global bandwidth limit or the maximum limit of >> concurrently downloading files. >> >> I am using the twisted.web.client.downloadPage to download the files >> and >> using the Deferred, that it returns. >> I can't understand how to make it still return a Deferred, >> corresponding >> to that file, but not start downloading right away, but instead start >> downloading it on some kind of event (make a manger-like wrapper for >> that function). >> >> So I want the code to still look simple like this: >> >> for link in links: >> d = downloadPage_limited(link, filename) >> >> And the wrapper(function downloadPage_limited) will manage the amount >> of >> concurrent downloads, and will still return the Deferred, which will be >> returned by twisted.web.client.downloadPage. >> >> Is my idea about a "wrapper" practical and what's the general way to >> write it? >> On which event is it better to decrement the counter of the amount >> currently downloading files? > > Yes, that's a good idea. > > You might be able to use twisted.internet.defer.DeferredSemaphore to > handle all of the counting for you. For example, > > from twisted.internet.defer import DeferredSemaphore > from twisted.web.client import downloadPage > > class LimitedDownloader: > def __init__(self, howMany): > self._semaphore = DeferredSemaphore(howMany) > > def downloadPage(self, *a, **kw): > return self._semaphore.run(downloadPage, *a, **kw) > > downloader = LimitedDownloader(3) > downloader.downloadPage(...) > > In this example, DeferredSemaphore.run will only let 3 downloadPage > calls run concurrently. If a 4th is attempted before any earlier ones > finish, it won't actually be called until one of the earlier ones does > finish, and then it will be called.
Thanks for quick and great help, Terry and Jean-Paul! _______________________________________________ Twisted-web mailing list [email protected] http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
