On Thu, Jun 14, 2018 at 3:31 PM, Tin Tvrtković <tinches...@gmail.com> wrote: > * my gut feeling is spawning a thousand tasks and having them all fighting > over the same semaphore and scheduling is going to be much less efficient > than a small number of tasks draining a queue.
Fundamentally, a Semaphore is a queue: https://github.com/python/cpython/blob/9e7c92193cc98fd3c2d4751c87851460a33b9118/Lib/asyncio/locks.py#L437 ...so the two approaches are more analogous than it might appear at first. The big difference is what objects are in the queue. For a web scraper, the options might be either a queue where each entry is a URL represented as a str, versus a queue where each entry is (effectively) a Task object with attached coroutine object. So I think the main differences you'll see in practice are: - a Task + coroutine aren't terribly big -- maybe a few kilobytes -- but definitely larger than a str; so the Semaphore approach will take more RAM. Modern machines have lots of RAM, so for many use cases this is still probably fine (50,000 tasks is really not that many). But there will certainly be some situations where the str queue fits in RAM but the Task queue doesn't. - If you create all those Task objects up front, then that front-loads a chunk of work (i.e., allocating all those objects!) that otherwise would be spread throughout the queue processing. So you'll see a noticeable pause up front before the code starts working. -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com