Re: [Python-Dev] A more flexible task creation
On Thu, Jun 14, 2018 at 3:31 PM, Tin Tvrtković wrote: > * my gut feeling is spawning a thousand tasks and having them all fighting > over the same semaphore and scheduling is going to be much less efficient > than a small number of tasks draining a queue. Fundamentally, a Semaphore is a queue: https://github.com/python/cpython/blob/9e7c92193cc98fd3c2d4751c87851460a33b9118/Lib/asyncio/locks.py#L437 ...so the two approaches are more analogous than it might appear at first. The big difference is what objects are in the queue. For a web scraper, the options might be either a queue where each entry is a URL represented as a str, versus a queue where each entry is (effectively) a Task object with attached coroutine object. So I think the main differences you'll see in practice are: - a Task + coroutine aren't terribly big -- maybe a few kilobytes -- but definitely larger than a str; so the Semaphore approach will take more RAM. Modern machines have lots of RAM, so for many use cases this is still probably fine (50,000 tasks is really not that many). But there will certainly be some situations where the str queue fits in RAM but the Task queue doesn't. - If you create all those Task objects up front, then that front-loads a chunk of work (i.e., allocating all those objects!) that otherwise would be spread throughout the queue processing. So you'll see a noticeable pause up front before the code starts working. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python-Dev Digest, Vol 179, Issue 21
The Jseries acknowlegement by using Jetty containers can get you a best resolution To python wheel asynchronism bugs Envoyé à partir d’un Smarpthone Android avec GMX Mail. Le 14/06/2018, 4:00 PM python-dev-requ...@python.org a écrit: On 13 Jun 2018, at 15:42, Nick Coghlan mailto:ncogh...@gmail.com>> wrote: On 13 June 2018 at 02:23, Guido van Rossum mailto:gu...@python.org>> wrote: So, to summarize, we need something like six for C? Yeah, pretty much - once we can get to the point where it's routine for folks to be building "abiX" or "abiXY" wheels (with the latter not actually being a defined compatibility tag yet, but having the meaning of "targets the stable ABI as first defined in CPython X.Y"), rather than feature release specific "cpXYm" ones, then a *lot* of the extension module maintenance pain otherwise arising from more frequent CPython releases should be avoided. There'd still be a lot of other details to work out to turn the proposed release cadence change into a practical reality, but this is the key piece that I think is a primarily technical hurdle: simplifying the current "wheel-per-python-version-per-target-platform" community project build matrices to instead be "wheel-per-target-platform”. This requires getting people to mostly stop using the non-stable ABI, and that could be a lot of work for projects that have existing C extensions that don’t use the stable ABI or cython/cffi/… That said, the CPython API tends to be fairly stable over releases and even without using the stable ABI supporting faster CPython feature releases shouldn’t be too onerous, especially for projects with some kind of automation for creating release artefacts (such as a CI system). Ronald ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A more flexible task creation
On Thu, Jun 14, 2018 at 10:03 PM Steve Dower wrote: > I often use > semaphores for this when I need it, and it looks like > asyncio.Semaphore() is sufficient for this: > > > import asyncio > task_limiter = asyncio.Semaphore(4) > > async def my_task(): > await task_limiter.acquire() > try: > await do_db_request() > finally: > task_limiter.release() Yeah, a semaphore logically fits exactly but * I feel this API is somewhat clunky, even if you use an 'async with'. * my gut feeling is spawning a thousand tasks and having them all fighting over the same semaphore and scheduling is going to be much less efficient than a small number of tasks draining a queue. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A more flexible task creation
Other folks have already chimed in, so I'll be to the point. Try writing a simple asyncio web scraper (using maybe the aiohttp library) and create 5000 tasks for scraping different sites. My prediction is a whole lot of them will time out due to various reasons. Other responses inline. On Thu, Jun 14, 2018 at 9:15 PM Chris Barker wrote: > async is not parallel -- all the tasks will be run in the same thread > (Unless you explicitly spawn another thread), and only one task is running > at once, and the task switching happens when the task specifically releases > itself. asyncio is mostly used for IO-heavy workloads (note the name). If you're doing IO in asyncio, it is most definitely parallel. The point of it is having a large number of open network connections at the same time. > So why do queries fail with 1 tasks? or ANY number? If the async DB > access code is written right, a given query should not "await" unless it is > in a safe state to do so. > Imagine you have a batch job you need to do. You need to fetch a million records from your database, and you can't use a query to get them all - you need a million individual "get" requests. Even if Python was infinitely fast, and your bandwidth was infinite, can your database handle opening a million new connections in parallel, in a very short time? Mine sure can't, even a few hundred extra connections would be a potential problem. So you want to do the work in chunks, but still not one by one. > and threads aren't synchronous -- but they are concurrent. > Using threads implies coupling threads with IO. Doing requests one at a time in a given thread. Generally called 'synchronous IO', as opposed to asynchronous IO/asyncio. > because threads ARE concurrent, and there is no advantage to having more > threads than can actually run at once, and having many more does cause > thread-switching performance issues. > Weeell technically threads in CPython aren't really concurrent (when running Python bytecode), but for doing IO they are in practice. When doing IO, there absolutely is an advantage to using more threads than can run at once (in CPython only one thread running Python can run at once). You can test it out yourself by writing a synchronous web scraper (using maybe the requests library) and trying to scrape using a threadpool vs using a single thread. You'll find the threadpool version is much faster. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A more flexible task creation
On 14Jun2018 1214, Chris Barker via Python-Dev wrote: Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to understand the problem here. But if I have this right: I've been using asyncio a lot lately and have encountered this problem several times. Imagine you want to do a lot of queries against a database, spawning 1 tasks in parallel will probably cause a lot of them to fail. async is not parallel -- all the tasks will be run in the same thread (Unless you explicitly spawn another thread), and only one task is running at once, and the task switching happens when the task specifically releases itself. If the task isn't actually doing the work, but merely waiting for it to finish, then you can end up overloading the thing that *is* doing the task (e.g. the network interface, database server, other thread/process, file system, etc.). Single-threaded async is actually all about *waiting* - it provides a convenient model to do other tasks while you are waiting for the first (as well as a convenient model to indicate what should be done after it completes - there are two conveniences here). If the underlying thing you're doing *can* run in parallel, but becomes less efficient the more times you do it (for example, most file system operations fall into this category), you will want to limit how many tasks you *start*, not just how many you are waiting for. I often use semaphores for this when I need it, and it looks like asyncio.Semaphore() is sufficient for this: import asyncio task_limiter = asyncio.Semaphore(4) async def my_task(): await task_limiter.acquire() try: await do_db_request() finally: task_limiter.release() Cheers, Steve ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A more flexible task creation
On Thu, Jun 14, 2018 at 9:17 PM Chris Barker via Python-Dev < python-dev@python.org> wrote: > Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to > understand the problem here. > Vocabulary-wise 'queue depth' might be a suitable mental aid for what people actually want to limit. The practical issue is most likely something to do with hitting timeouts when trying to queue too much work onto a service. -- Joni Orponen ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A more flexible task creation
Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to understand the problem here. But if I have this right: I've been using asyncio a lot lately and have encountered this problem > several times. Imagine you want to do a lot of queries against a database, > spawning 1 tasks in parallel will probably cause a lot of them to fail. > async is not parallel -- all the tasks will be run in the same thread (Unless you explicitly spawn another thread), and only one task is running at once, and the task switching happens when the task specifically releases itself. If it matters in what order the tasks are performed, then you should not be using async. So why do queries fail with 1 tasks? or ANY number? If the async DB access code is written right, a given query should not "await" unless it is in a safe state to do so. So what am I missing here??? What you need in a task pool of sorts, to limit concurrency and do only 20 > requests in parallel. > still wrapping my head around the vocabulary, but async is not concurrent. If we were doing this synchronously, we wouldn't spawn 1 threads using > 1 connections, > and threads aren't synchronous -- but they are concurrent. > we would use a thread pool with a limited number of threads and submit the > jobs into its queue. > because threads ARE concurrent, and there is no advantage to having more threads than can actually run at once, and having many more does cause thread-switching performance issues. To me, tasks are (somewhat) logically analogous to threads. > kinda -- in the sense that they are run (and completed) in arbitrary order, But they are different, and that difference is key to this issue. As Yury expressed interest in this idea, there must be something I'm missing. What is it? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: reduce GC threshold in development mode (-X dev)
On Fri, 8 Jun 2018 09:48:03 +0200 Victor Stinner wrote: > > Question: Do you think that bugs spotted by a GC collection are common > enough to change the GC thresholds in development mode (new -X dev > flag of Python 3.7)? I don't think replacing a more-or-less arbitrary value with another more-or-less arbitrary value is a very useful change. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A more flexible task creation
On Thu, 14 Jun 2018 at 17:40, Tin Tvrtković wrote: > Hi, > > I've been using asyncio a lot lately and have encountered this problem > several times. Imagine you want to do a lot of queries against a database, > spawning 1 tasks in parallel will probably cause a lot of them to fail. > What you need in a task pool of sorts, to limit concurrency and do only 20 > requests in parallel. > > If we were doing this synchronously, we wouldn't spawn 1 threads using > 1 connections, we would use a thread pool with a limited number of > threads and submit the jobs into its queue. > > To me, tasks are (somewhat) logically analogous to threads. The solution > that first comes to mind is to create an AsyncioTaskExecutor with a > submit(coro, *args, **kwargs) method. Put a reference to the coroutine and > its arguments into an asyncio queue. Spawn n tasks pulling from this queue > and awaiting the coroutines. > > It'd probably be useful to have this in the stdlib at some point. > Probably a good idea, yes, because it seems a rather common use case. OTOH, I did something similar but for a different use case. In my case, I have a Watchdog class, that takes a list of (coro, *args, **kwargs). What it does is ensure there is always a task for each of the co-routines running, and watches the tasks, if they crash they are automatically restarted (with logging). Then there is a stop() method to cancel the watchdog-managed tasks and await them. My use case is because I tend to write a lot of singleton-style objects, which need book keeping tasks, or redis pubsub listening tasks, and my primary concern is not starting lots of tasks, it is that the few tasks I have must be restarted if they crash, forever. This is why I think it's not that hard to write "sugar" APIs on top of asyncio, and everyone's needs will be different. The strict API compatibility requirements of core Python stdlib, coupled with the very long feature release life-cycles of Python, make me think this sort of thing perhaps is better built in an utility library on top of asyncio, rather than inside asyncio itself? 18 months is a long long time to iterate on these features. I can't wait for Python 3.8... > > Date: Wed, 13 Jun 2018 22:45:22 +0200 >> From: Michel Desmoulin >> To: python-dev@python.org >> Subject: [Python-Dev] A more flexible task creation >> Message-ID: >> Content-Type: text/plain; charset=utf-8 >> >> I was working on a concurrency limiting code for asyncio, so the user >> may submit as many tasks as one wants, but only a max number of tasks >> will be submitted to the event loop at the same time. >> >> However, I wanted that passing an awaitable would always return a task, >> no matter if the task was currently scheduled or not. The goal is that >> you could add done callbacks to it, decide to force schedule it, etc >> >> I dug in the asyncio.Task code, and encountered: >> >> def __init__(self, coro, *, loop=None): >> ... >> self._loop.call_soon(self._step) >> self.__class__._all_tasks.add(self) >> >> I was surprised to see that instantiating a Task class has any side >> effect at all, let alone 2, and one of them being to be immediately >> scheduled for execution. >> >> I couldn't find a clean way to do what I wanted: either you >> loop.create_task() and you get a task but it runs, or you don't run >> anything, but you don't get a nice task object to hold on to. >> >> I tried several alternatives, like returning a future, and binding the >> future awaiting to the submission of a task, but that was complicated >> code that duplicated a lot of things. >> >> I tried creating a custom task, but it was even harder, setting a custom >> event policy, to provide a custom event loop with my own create_task() >> accepting parameters. That's a lot to do just to provide a parameter to >> Task, especially if you already use a custom event loop (e.g: uvloop). I >> was expecting to have to create a task factory only, but task factories >> can't get any additional parameters from create_task()). >> >> Additionally I can't use ensure_future(), as it doesn't allow to pass >> any parameter to the underlying Task, so if I want to accept any >> awaitable in my signature, I need to provide my own custom >> ensure_future(). >> >> All those implementations access a lot of _private_api, and do other >> shady things that linters hate; plus they are fragile at best. What's >> more, Task being rewritten in C prevents things like setting self._coro, >> so we can only inherit from the pure Python slow version. >> >> In the end, I can't even await the lazy task, because it blocks the >> entire program. >> >> Hence I have 2 distinct, but independent albeit related, proposals: >> >> - Allow Task to be created but not scheduled for execution, and add a >> parameter to ensure_future() and create_task() to control this. Awaiting >> such a task would just do like asyncio.sleep(O) until it is scheduled >> for execution. >> >> -
Re: [Python-Dev] A more flexible task creation
On Thu, Jun 14, 2018 at 12:40 PM Tin Tvrtković wrote: > > Hi, > > I've been using asyncio a lot lately and have encountered this problem > several times. Imagine you want to do a lot of queries against a database, > spawning 1 tasks in parallel will probably cause a lot of them to fail. > What you need in a task pool of sorts, to limit concurrency and do only 20 > requests in parallel. > > If we were doing this synchronously, we wouldn't spawn 1 threads using > 1 connections, we would use a thread pool with a limited number of > threads and submit the jobs into its queue. > > To me, tasks are (somewhat) logically analogous to threads. The solution that > first comes to mind is to create an AsyncioTaskExecutor with a submit(coro, > *args, **kwargs) method. Put a reference to the coroutine and its arguments > into an asyncio queue. Spawn n tasks pulling from this queue and awaiting the > coroutines. > > It'd probably be useful to have this in the stdlib at some point. Sounds like a good idea! Feel free to open an issue to prototype the API. Yury ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A more flexible task creation
Hi, I've been using asyncio a lot lately and have encountered this problem several times. Imagine you want to do a lot of queries against a database, spawning 1 tasks in parallel will probably cause a lot of them to fail. What you need in a task pool of sorts, to limit concurrency and do only 20 requests in parallel. If we were doing this synchronously, we wouldn't spawn 1 threads using 1 connections, we would use a thread pool with a limited number of threads and submit the jobs into its queue. To me, tasks are (somewhat) logically analogous to threads. The solution that first comes to mind is to create an AsyncioTaskExecutor with a submit(coro, *args, **kwargs) method. Put a reference to the coroutine and its arguments into an asyncio queue. Spawn n tasks pulling from this queue and awaiting the coroutines. It'd probably be useful to have this in the stdlib at some point. Date: Wed, 13 Jun 2018 22:45:22 +0200 > From: Michel Desmoulin > To: python-dev@python.org > Subject: [Python-Dev] A more flexible task creation > Message-ID: > Content-Type: text/plain; charset=utf-8 > > I was working on a concurrency limiting code for asyncio, so the user > may submit as many tasks as one wants, but only a max number of tasks > will be submitted to the event loop at the same time. > > However, I wanted that passing an awaitable would always return a task, > no matter if the task was currently scheduled or not. The goal is that > you could add done callbacks to it, decide to force schedule it, etc > > I dug in the asyncio.Task code, and encountered: > > def __init__(self, coro, *, loop=None): > ... > self._loop.call_soon(self._step) > self.__class__._all_tasks.add(self) > > I was surprised to see that instantiating a Task class has any side > effect at all, let alone 2, and one of them being to be immediately > scheduled for execution. > > I couldn't find a clean way to do what I wanted: either you > loop.create_task() and you get a task but it runs, or you don't run > anything, but you don't get a nice task object to hold on to. > > I tried several alternatives, like returning a future, and binding the > future awaiting to the submission of a task, but that was complicated > code that duplicated a lot of things. > > I tried creating a custom task, but it was even harder, setting a custom > event policy, to provide a custom event loop with my own create_task() > accepting parameters. That's a lot to do just to provide a parameter to > Task, especially if you already use a custom event loop (e.g: uvloop). I > was expecting to have to create a task factory only, but task factories > can't get any additional parameters from create_task()). > > Additionally I can't use ensure_future(), as it doesn't allow to pass > any parameter to the underlying Task, so if I want to accept any > awaitable in my signature, I need to provide my own custom ensure_future(). > > All those implementations access a lot of _private_api, and do other > shady things that linters hate; plus they are fragile at best. What's > more, Task being rewritten in C prevents things like setting self._coro, > so we can only inherit from the pure Python slow version. > > In the end, I can't even await the lazy task, because it blocks the > entire program. > > Hence I have 2 distinct, but independent albeit related, proposals: > > - Allow Task to be created but not scheduled for execution, and add a > parameter to ensure_future() and create_task() to control this. Awaiting > such a task would just do like asyncio.sleep(O) until it is scheduled > for execution. > > - Add an parameter to ensure_future() and create_task() named "kwargs" > that accept a mapping and will be passed as **kwargs to the underlying > created Task. > > I insist on the fact that the 2 proposals are independent, so please > don't reject both if you don't like one or the other. Passing a > parameter to the underlying custom Task is still of value even without > the unscheduled instantiation, and vice versa. > > Also, if somebody has any idea on how to make a LazyTask that we can > await on without blocking everything, I'll take it. > > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com