Re: [Python-Dev] A more flexible task creation

2018-07-11 Thread Michel Desmoulin

> To be honest, I see "async with" being abused everywhere in asyncio,
> lately.  I like to have objects with start() and stop() methods, but
> everywhere I see async context managers.>
> Fine, add nursery or whatever, but please also have a simple start() /
> stop() public API.
> 
> "async with" is only good for functional programming.  If you want to go
> more of an object-oriented style, you tend to have start() and stop()
> methods in your classes, which will call start() & stop() (or close())
> methods recursively on nested resources.  So of the libraries (aiopg,
> I'm looking at you) don't support start/stop or open/close well.

Wouldn't calling __enter__ and __exit__ manually works for you ? I
started coding begin() and stop(), but I removed them, as I couldn't
find a use case for them.

And what exactly is the use case that doesn't work with `async with` ?
The whole point is to spot the boundaries of the tasks execution easily.
If you start()/stop() randomly, it kinda defeat the purpose.

It's a genuine question though. I can totally accept I overlooked a
valid use case.


> 
> I tend to slightly agree, but OTOH if asyncio had been designed to not
> schedule tasks automatically on __init__ I bet there would have been
> other users complaining that "why didn't task XX run?", or "why do tasks
> need a start() method, that is clunky!".  You can't please everyone...

Well, ensure_future([schedule_immediatly=True]) and
asyncio.create_task([schedule_immediatly=True] would take care of that.
They are the entry point for the task creation and scheduling.

> 
> Also, in
>              task_list = run.all(foo(), foo(), foo())
> 
> As soon as you call foo(), you are instantiating a coroutine, which
> consumes memory, while the task may not even be scheduled for a long
> time (if you have 5000 potential tasks but only execute 10 at a time,
> for example).

Yes but this has the benefit of accepting any awaitable, not just
coroutine. You don't have to wonder what to pass, or which form. It's
always the same. Too many APi are hard to understand because you never
know if it accept a callback, a coroutine function, a coroutine, a task,
a future...

For the same reason, request.get() create and destroys a session every
time. It's inefficient, but way easier to understand, and fits the
majority of the use cases.

> 
> But if you do as Yuri suggested, you'll instead accept a function
> reference, foo, which is a singleton, you can have many foo references
> to the function, but they will only create coroutine objects when the
> task is actually about to be scheduled, so it's more efficient in terms
> of memory.

I made some test, and the memory consumption is indeed radically smaller
if you just store references if you just compare it to the same unique
raw coroutine.

However, this is a rare case. It assumes that:

- you have a lot of tasks
- you have a max concurrency
- the max concurrency is very small
- most tasks reuse a similar combination of callables and parameters

It's a very specific narrow case. Also, everything you store on the
scope will be wrapped into a Future object no matter if it's scheduled
or not, so that you can cancel it later. So the scale of the memory
consumption is not as much.

I didn't want to compromise the quality of the current API for the
general case for an edge case optimization.

On the other hand, this is a low hanging fruit and on plateforms such as
raspi where asyncio has a lot to offer, it can make a big difference to
shave up 20 of memory consumption of a specific workload.

So I listened and implemented an escape hatch:

import random
import asyncio

import ayo

async def zzz(seconds):
await asyncio.sleep(seconds)
print(f'Slept for {seconds} seconds')


@ayo.run_as_main()
async def main(run_in_top):

async with ayo.scope(max_concurrency=10) as run:
for _ in range(1):
run.from_callable(zzz, 0.005) # or run.asap(zzz(0.005))

This would only lazily create the awaitable (here the coroutine) on
scheduling. I see a 15% of memory saving for the WHOLE program if using
`from_callable()`.

So definitly a good feature to have, thank you.

But again, and I hope Yuri is reading this because he will implement
that for uvloop, and this will trickles down to asyncio, I think we
should not compromise the main API for this.

asyncio is hard enough to grok, and too many concepts fly around. The
average Python programmer has been experienced way easier things from
past Python encounter.

If we want, one day, that asyncio is consider the clean AND easy way to
do async, we need to work on the API.

asyncio.run() is a step in the right direction (although again I wish we
implemented that 2 years ago when I talked about it instead of telling
me no).

Now if we add nurseries, it should hide the rest of the complexity. Not
add to it.

___
Python-Dev mailing list
Python-Dev@python.org

Re: [Python-Dev] A more flexible task creation

2018-06-15 Thread Gustavo Carneiro
On Fri, 15 Jun 2018 at 09:18, Michel Desmoulin 
wrote:

>
> >
> > The strict API compatibility requirements of core Python stdlib, coupled
> > with the very long feature release life-cycles of Python, make me think
> > this sort of thing perhaps is better built in an utility library on top
> > of asyncio, rather than inside asyncio itself?  18 months is a long long
> > time to iterate on these features.  I can't wait for Python 3.8...
> >
>
> A lot of my late requests come from my attempt to group some of that in
> a lib: https://github.com/Tygs/ayo
>

Ah, good idea.


> Most of it works, although I got read of context() recently, but the
> lazy task part really fails.
>
>
> Indeed, the API allows to do:
>
> async with ayo.scope() as run:
> task_list = run.all(foo(), foo(), foo())
> run.asap(bar())
> await task_list.gather()
> run.asap(baz())
>
>
>
> scope() return a nursery like object, and this works perfectly, with the
> usual guaranty of Trio's nursery, but working in asyncio right now.
>

To be honest, I see "async with" being abused everywhere in asyncio,
lately.  I like to have objects with start() and stop() methods, but
everywhere I see async context managers.

Fine, add nursery or whatever, but please also have a simple start() /
stop() public API.

"async with" is only good for functional programming.  If you want to go
more of an object-oriented style, you tend to have start() and stop()
methods in your classes, which will call start() & stop() (or close())
methods recursively on nested resources.  So of the libraries (aiopg, I'm
looking at you) don't support start/stop or open/close well.


> However, I tried to add to the mix:
>
> async with ayo.scope(max_concurrency=2) as run:
> task_list = run.all(foo(), foo(), foo())
> run.asap(bar())
> await task_list.gather()
> run.asap(baz())
>
> And I can get it to work. task_list will right now contains a list of
> tasks and None, because some tasks are not scheduled immediately. That's
> why I wanted lazytasks. I tried to create my own lazy tasks, but it
> never really worked. I'm going to try to go down the road of wrapping
> the unscheduled coro in a future-like object as suggested by Yuri. But
> having that built-in seems logical, elegant, and just good design in
> general: __init__ should not have side effects.
>

I tend to slightly agree, but OTOH if asyncio had been designed to not
schedule tasks automatically on __init__ I bet there would have been other
users complaining that "why didn't task XX run?", or "why do tasks need a
start() method, that is clunky!".  You can't please everyone...

Also, in
 task_list = run.all(foo(), foo(), foo())

As soon as you call foo(), you are instantiating a coroutine, which
consumes memory, while the task may not even be scheduled for a long time
(if you have 5000 potential tasks but only execute 10 at a time, for
example).

But if you do as Yuri suggested, you'll instead accept a function
reference, foo, which is a singleton, you can have many foo references to
the function, but they will only create coroutine objects when the task is
actually about to be scheduled, so it's more efficient in terms of memory.

-- 
Gustavo J. A. M. Carneiro
Gambit Research
"The universe is always one step beyond logic." -- Frank Herbert
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-15 Thread Michel Desmoulin


Le 14/06/2018 à 04:09, Nathaniel Smith a écrit :
> How about:
> 
> async def wait_to_run(async_fn, *args):
>     await wait_for_something()
>     return await async_fn(*args)
> 
> task = loop.create_task(wait_to_run(myfunc, ...))
> 

It's quite elegant, although figuring out the wait_for_something() is
going to be tricky.


> -
> 
> Whatever strategy you use, you should also think about what semantics
> you want if one of these delayed tasks is cancelled before it starts.
> 
> For regular, non-delayed tasks, Trio makes sure that even if it gets
> cancelled before it starts, then it still gets scheduled and runs until
> the first cancellation point. This is necessary for correct resource
> hand-off between tasks:
> 
> async def some_task(handle):
>     with handle:
>         await ...
> 
> If we skipped running this task entirely, then the handle wouldn't be
> closed properly; scheduling it once allows the with block to run, and
> then get cleaned up by the cancellation exception. I'm not sure but I
> think asyncio handles pre-cancellation in a similar way. (Yury, do you
> know?
> 
> Now, in delayed task case, there's a similar issue. If you want to keep
> the same solution, then you might want to instead write:
> 
> # asyncio
> async def wait_to_run(async_fn, *args):
>     try:
>         await wait_for_something()
>     except asyncio.CancelledError:
>         # have to create a subtask to make it cancellable
>         subtask = loop.create_task(async_fn(*args))
>         # then cancel it immediately
>         subtask.cancel()
>         # and wait for the cancellation to be processed
>         return await subtask
>     else:
>         return await async_fn(*args)
> 
> In trio, this could be simplified to
> 
> # trio
> async def wait_to_run(async_fn, *args):
>     try:
>         await wait_for_something()
>     except trio.Cancelled:
>         pass
>     return await async_fn(*args)
> 
> (This works because of trio's "stateful cancellation" – if the whole
> thing is cancelled, then as soon as async_fn hits a cancellation point
> the exception will be re-delivered.)

Thanks for the tip. It schedules it in all cases, but I don't know what
asyncio does with it. I'll add a unit test for that.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-15 Thread Michel Desmoulin

> 
> The strict API compatibility requirements of core Python stdlib, coupled
> with the very long feature release life-cycles of Python, make me think
> this sort of thing perhaps is better built in an utility library on top
> of asyncio, rather than inside asyncio itself?  18 months is a long long
> time to iterate on these features.  I can't wait for Python 3.8...
>  

A lot of my late requests come from my attempt to group some of that in
a lib: https://github.com/Tygs/ayo

Most of it works, although I got read of context() recently, but the
lazy task part really fails.


Indeed, the API allows to do:

async with ayo.scope() as run:
task_list = run.all(foo(), foo(), foo())
run.asap(bar())
await task_list.gather()
run.asap(baz())



scope() return a nursery like object, and this works perfectly, with the
usual guaranty of Trio's nursery, but working in asyncio right now.

However, I tried to add to the mix:

async with ayo.scope(max_concurrency=2) as run:
task_list = run.all(foo(), foo(), foo())
run.asap(bar())
await task_list.gather()
run.asap(baz())

And I can get it to work. task_list will right now contains a list of
tasks and None, because some tasks are not scheduled immediately. That's
why I wanted lazytasks. I tried to create my own lazy tasks, but it
never really worked. I'm going to try to go down the road of wrapping
the unscheduled coro in a future-like object as suggested by Yuri. But
having that built-in seems logical, elegant, and just good design in
general: __init__ should not have side effects.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-15 Thread Steve Holden
On Thu, Jun 14, 2018 at 8:14 PM, Chris Barker via Python-Dev <
python-dev@python.org> wrote:

> Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to
> understand the problem here.
>


> So why do queries fail with 1 tasks? or ANY number? If the async DB
> access code is written right, a given query should not "await" unless it is
> in a safe state to do so.
>
> So what am I missing here???
>
> because threads ARE concurrent, and there is no advantage to having more
>> threads than can actually run at once, and having many more does cause
>> thread-switching performance issues.
>>
>
> To me, tasks are (somewhat) logically analogous to threads.
>>
>
> kinda -- in the sense that they are run (and completed) in arbitrary
> order, But they are different, and that difference is key to this issue.
>
> As Yury expressed interest in this idea, there must be something I'm
> missing.
>
> What is it?
>

All tasks need resources, and bookkeeping for such tasks is likely to slow
things down. More importantly, with an uncontrolled number of tasks you can
require an uncontrolled use of resources, decreasing efficiency to levels
well below that attainable with sensible conservation of resources.
Imagine, if you will, a task that starts by allocating 1GB of memory. Would
you want 10,000 of those?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Nathaniel Smith
On Thu, Jun 14, 2018 at 3:31 PM, Tin Tvrtković  wrote:
> * my gut feeling is spawning a thousand tasks and having them all fighting
> over the same semaphore and scheduling is going to be much less efficient
> than a small number of tasks draining a queue.

Fundamentally, a Semaphore is a queue:

https://github.com/python/cpython/blob/9e7c92193cc98fd3c2d4751c87851460a33b9118/Lib/asyncio/locks.py#L437

...so the two approaches are more analogous than it might appear at
first. The big difference is what objects are in the queue. For a web
scraper, the options might be either a queue where each entry is a URL
represented as a str, versus a queue where each entry is (effectively)
a Task object with attached coroutine object.

So I think the main differences you'll see in practice are:

- a Task + coroutine aren't terribly big -- maybe a few kilobytes --
but definitely larger than a str; so the Semaphore approach will take
more RAM. Modern machines have lots of RAM, so for many use cases this
is still probably fine (50,000 tasks is really not that many). But
there will certainly be some situations where the str queue fits in
RAM but the Task queue doesn't.

- If you create all those Task objects up front, then that front-loads
a chunk of work (i.e., allocating all those objects!) that otherwise
would be spread throughout the queue processing. So you'll see a
noticeable pause up front before the code starts working.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Tin Tvrtković
On Thu, Jun 14, 2018 at 10:03 PM Steve Dower  wrote:

> I often use
> semaphores for this when I need it, and it looks like
> asyncio.Semaphore() is sufficient for this:
>
>
> import asyncio
> task_limiter = asyncio.Semaphore(4)
>
> async def my_task():
>  await task_limiter.acquire()
>  try:
>  await do_db_request()
>  finally:
>  task_limiter.release()


Yeah, a semaphore logically fits exactly but

* I feel this API is somewhat clunky, even if you use an 'async with'.

* my gut feeling is spawning a thousand tasks and having them all fighting
over the same semaphore and scheduling is going to be much less efficient
than a small number of tasks draining a queue.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Tin Tvrtković
Other folks have already chimed in, so I'll be to the point. Try writing a
simple asyncio web scraper (using maybe the aiohttp library) and create
5000 tasks for scraping different sites. My prediction is a whole lot of
them will time out due to various reasons.

Other responses inline.

On Thu, Jun 14, 2018 at 9:15 PM Chris Barker  wrote:

> async is not parallel -- all the tasks will be run in the same thread
> (Unless you explicitly spawn another thread), and only one task is running
> at once, and the task switching happens when the task specifically releases
> itself.


asyncio is mostly used for IO-heavy workloads (note the name). If you're
doing IO in asyncio, it is most definitely parallel. The point of it is
having a large number of open network connections at the same time.


> So why do queries fail with 1 tasks? or ANY number? If the async DB
> access code is written right, a given query should not "await" unless it is
> in a safe state to do so.
>

Imagine you have a batch job you need to do. You need to fetch a million
records from your database, and you can't use a query to get them all - you
need a million individual "get" requests. Even if Python was infinitely
fast, and your bandwidth was infinite, can your database handle opening a
million new connections in parallel, in a very short time? Mine sure can't,
even a few hundred extra connections would be a potential problem. So you
want to do the work in chunks, but still not one by one.


> and threads aren't synchronous -- but they are concurrent.
>

Using threads implies coupling threads with IO. Doing requests one at a
time in a given thread. Generally called 'synchronous IO', as opposed to
asynchronous IO/asyncio.


>  because threads ARE concurrent, and there is no advantage to having more
> threads than can actually run at once, and having many more does cause
> thread-switching performance issues.
>

Weeell technically threads in CPython aren't really concurrent (when
running Python bytecode), but for doing IO they are in practice. When doing
IO, there absolutely is an advantage to using more threads than can run at
once (in CPython only one thread running Python can run at once). You can
test it out yourself by writing a synchronous web scraper (using maybe the
requests library) and trying to scrape using a threadpool vs using a single
thread. You'll find the threadpool version is much faster.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Steve Dower

On 14Jun2018 1214, Chris Barker via Python-Dev wrote:
Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying 
to understand the problem here.


But if I have this right:

I've been using asyncio a lot lately and have encountered this
problem several times. Imagine you want to do a lot of queries
against a database, spawning 1 tasks in parallel will probably
cause a lot of them to fail.


async is not parallel -- all the tasks will be run in the same thread 
(Unless you explicitly spawn another thread), and only one task is 
running at once, and the task switching happens when the task 
specifically releases itself.


If the task isn't actually doing the work, but merely waiting for it to 
finish, then you can end up overloading the thing that *is* doing the 
task (e.g. the network interface, database server, other thread/process, 
file system, etc.).


Single-threaded async is actually all about *waiting* - it provides a 
convenient model to do other tasks while you are waiting for the first 
(as well as a convenient model to indicate what should be done after it 
completes - there are two conveniences here).


If the underlying thing you're doing *can* run in parallel, but becomes 
less efficient the more times you do it (for example, most file system 
operations fall into this category), you will want to limit how many 
tasks you *start*, not just how many you are waiting for. I often use 
semaphores for this when I need it, and it looks like 
asyncio.Semaphore() is sufficient for this:



import asyncio
task_limiter = asyncio.Semaphore(4)

async def my_task():
await task_limiter.acquire()
try:
await do_db_request()
finally:
task_limiter.release()


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Joni Orponen
On Thu, Jun 14, 2018 at 9:17 PM Chris Barker via Python-Dev <
python-dev@python.org> wrote:

> Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to
> understand the problem here.
>

Vocabulary-wise 'queue depth' might be a suitable mental aid for what
people actually want to limit. The practical issue is most likely something
to do with hitting timeouts when trying to queue too much work onto a
service.

-- 
Joni Orponen
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Chris Barker via Python-Dev
Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to
understand the problem here.

But if I have this right:

I've been using asyncio a lot lately and have encountered this problem
> several times. Imagine you want to do a lot of queries against a database,
> spawning 1 tasks in parallel will probably cause a lot of them to fail.
>

async is not parallel -- all the tasks will be run in the same thread
(Unless you explicitly spawn another thread), and only one task is running
at once, and the task switching happens when the task specifically releases
itself.

If it matters in what order the tasks are performed, then you should not be
using async.

So why do queries fail with 1 tasks? or ANY number? If the async DB
access code is written right, a given query should not "await" unless it is
in a safe state to do so.

So what am I missing here???

What you need in a task pool of sorts, to limit concurrency and do only 20
> requests in parallel.
>

still wrapping my head around the vocabulary, but async is not concurrent.

If we were doing this synchronously, we wouldn't spawn 1 threads using
> 1 connections,
>

and threads aren't synchronous -- but they are concurrent.


> we would use a thread pool with a limited number of threads and submit the
> jobs into its queue.
>

because threads ARE concurrent, and there is no advantage to having more
threads than can actually run at once, and having many more does cause
thread-switching performance issues.

To me, tasks are (somewhat) logically analogous to threads.
>

kinda -- in the sense that they are run (and completed) in arbitrary order,
But they are different, and that difference is key to this issue.

As Yury expressed interest in this idea, there must be something I'm
missing.

What is it?

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Gustavo Carneiro
On Thu, 14 Jun 2018 at 17:40, Tin Tvrtković  wrote:

> Hi,
>
> I've been using asyncio a lot lately and have encountered this problem
> several times. Imagine you want to do a lot of queries against a database,
> spawning 1 tasks in parallel will probably cause a lot of them to fail.
> What you need in a task pool of sorts, to limit concurrency and do only 20
> requests in parallel.
>
> If we were doing this synchronously, we wouldn't spawn 1 threads using
> 1 connections, we would use a thread pool with a limited number of
> threads and submit the jobs into its queue.
>
> To me, tasks are (somewhat) logically analogous to threads. The solution
> that first comes to mind is to create an AsyncioTaskExecutor with a
> submit(coro, *args, **kwargs) method. Put a reference to the coroutine and
> its arguments into an asyncio queue. Spawn n tasks pulling from this queue
> and awaiting the coroutines.
>

> It'd probably be useful to have this in the stdlib at some point.
>

Probably a good idea, yes, because it seems a rather common use case.

OTOH, I did something similar but for a different use case.  In my case, I
have a Watchdog class, that takes a list of (coro, *args, **kwargs).  What
it does is ensure there is always a task for each of the co-routines
running, and watches the tasks, if they crash they are automatically
restarted (with logging).  Then there is a stop() method to cancel the
watchdog-managed tasks and await them.  My use case is because I tend to
write a lot of singleton-style objects, which need book keeping tasks, or
redis pubsub listening tasks, and my primary concern is not starting lots
of tasks, it is that the few tasks I have must be restarted if they crash,
forever.

This is why I think it's not that hard to write "sugar" APIs on top of
asyncio, and everyone's needs will be different.

The strict API compatibility requirements of core Python stdlib, coupled
with the very long feature release life-cycles of Python, make me think
this sort of thing perhaps is better built in an utility library on top of
asyncio, rather than inside asyncio itself?  18 months is a long long time
to iterate on these features.  I can't wait for Python 3.8...


>
> Date: Wed, 13 Jun 2018 22:45:22 +0200
>> From: Michel Desmoulin 
>> To: python-dev@python.org
>> Subject: [Python-Dev] A more flexible task creation
>> Message-ID: 
>> Content-Type: text/plain; charset=utf-8
>>
>> I was working on a concurrency limiting code for asyncio, so the user
>> may submit as many tasks as one wants, but only a max number of tasks
>> will be submitted to the event loop at the same time.
>>
>> However, I wanted that passing an awaitable would always return a task,
>> no matter if the task was currently scheduled or not. The goal is that
>> you could add done callbacks to it, decide to force schedule it, etc
>>
>> I dug in the asyncio.Task code, and encountered:
>>
>> def __init__(self, coro, *, loop=None):
>> ...
>> self._loop.call_soon(self._step)
>> self.__class__._all_tasks.add(self)
>>
>> I was surprised to see that instantiating a Task class has any side
>> effect at all, let alone 2, and one of them being to be immediately
>> scheduled for execution.
>>
>> I couldn't find a clean way to do what I wanted: either you
>> loop.create_task() and you get a task but it runs, or you don't run
>> anything, but you don't get a nice task object to hold on to.
>>
>> I tried several alternatives, like returning a future, and binding the
>> future awaiting to the submission of a task, but that was complicated
>> code that duplicated a lot of things.
>>
>> I tried creating a custom task, but it was even harder, setting a custom
>> event policy, to provide a custom event loop with my own create_task()
>> accepting parameters. That's a lot to do just to provide a parameter to
>> Task, especially if you already use a custom event loop (e.g: uvloop). I
>> was expecting to have to create a task factory only, but task factories
>> can't get any additional parameters from create_task()).
>>
>> Additionally I can't use ensure_future(), as it doesn't allow to pass
>> any parameter to the underlying Task, so if I want to accept any
>> awaitable in my signature, I need to provide my own custom
>> ensure_future().
>>
>> All those implementations access a lot of _private_api, and do other
>> shady things that linters hate; plus they are fragile at best. What's
>> more, Task being rewritten in C prevents things like setting self._coro,
>> so we can only inherit from the pure Python slow version.
>>
>> In the end, I can't 

Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Yury Selivanov
On Thu, Jun 14, 2018 at 12:40 PM Tin Tvrtković  wrote:
>
> Hi,
>
> I've been using asyncio a lot lately and have encountered this problem 
> several times. Imagine you want to do a lot of queries against a database, 
> spawning 1 tasks in parallel will probably cause a lot of them to fail. 
> What you need in a task pool of sorts, to limit concurrency and do only 20 
> requests in parallel.
>
> If we were doing this synchronously, we wouldn't spawn 1 threads using 
> 1 connections, we would use a thread pool with a limited number of 
> threads and submit the jobs into its queue.
>
> To me, tasks are (somewhat) logically analogous to threads. The solution that 
> first comes to mind is to create an AsyncioTaskExecutor with a submit(coro, 
> *args, **kwargs) method. Put a reference to the coroutine and its arguments 
> into an asyncio queue. Spawn n tasks pulling from this queue and awaiting the 
> coroutines.
>
> It'd probably be useful to have this in the stdlib at some point.

Sounds like a good idea!  Feel free to open an issue to prototype the API.

Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Tin Tvrtković
Hi,

I've been using asyncio a lot lately and have encountered this problem
several times. Imagine you want to do a lot of queries against a database,
spawning 1 tasks in parallel will probably cause a lot of them to fail.
What you need in a task pool of sorts, to limit concurrency and do only 20
requests in parallel.

If we were doing this synchronously, we wouldn't spawn 1 threads using
1 connections, we would use a thread pool with a limited number of
threads and submit the jobs into its queue.

To me, tasks are (somewhat) logically analogous to threads. The solution
that first comes to mind is to create an AsyncioTaskExecutor with a
submit(coro, *args, **kwargs) method. Put a reference to the coroutine and
its arguments into an asyncio queue. Spawn n tasks pulling from this queue
and awaiting the coroutines.

It'd probably be useful to have this in the stdlib at some point.

Date: Wed, 13 Jun 2018 22:45:22 +0200
> From: Michel Desmoulin 
> To: python-dev@python.org
> Subject: [Python-Dev] A more flexible task creation
> Message-ID: 
> Content-Type: text/plain; charset=utf-8
>
> I was working on a concurrency limiting code for asyncio, so the user
> may submit as many tasks as one wants, but only a max number of tasks
> will be submitted to the event loop at the same time.
>
> However, I wanted that passing an awaitable would always return a task,
> no matter if the task was currently scheduled or not. The goal is that
> you could add done callbacks to it, decide to force schedule it, etc
>
> I dug in the asyncio.Task code, and encountered:
>
> def __init__(self, coro, *, loop=None):
> ...
> self._loop.call_soon(self._step)
> self.__class__._all_tasks.add(self)
>
> I was surprised to see that instantiating a Task class has any side
> effect at all, let alone 2, and one of them being to be immediately
> scheduled for execution.
>
> I couldn't find a clean way to do what I wanted: either you
> loop.create_task() and you get a task but it runs, or you don't run
> anything, but you don't get a nice task object to hold on to.
>
> I tried several alternatives, like returning a future, and binding the
> future awaiting to the submission of a task, but that was complicated
> code that duplicated a lot of things.
>
> I tried creating a custom task, but it was even harder, setting a custom
> event policy, to provide a custom event loop with my own create_task()
> accepting parameters. That's a lot to do just to provide a parameter to
> Task, especially if you already use a custom event loop (e.g: uvloop). I
> was expecting to have to create a task factory only, but task factories
> can't get any additional parameters from create_task()).
>
> Additionally I can't use ensure_future(), as it doesn't allow to pass
> any parameter to the underlying Task, so if I want to accept any
> awaitable in my signature, I need to provide my own custom ensure_future().
>
> All those implementations access a lot of _private_api, and do other
> shady things that linters hate; plus they are fragile at best. What's
> more, Task being rewritten in C prevents things like setting self._coro,
> so we can only inherit from the pure Python slow version.
>
> In the end, I can't even await the lazy task, because it blocks the
> entire program.
>
> Hence I have 2 distinct, but independent albeit related, proposals:
>
> - Allow Task to be created but not scheduled for execution, and add a
> parameter to ensure_future() and create_task() to control this. Awaiting
> such a task would just do like asyncio.sleep(O) until it is scheduled
> for execution.
>
> - Add an parameter to ensure_future() and create_task() named "kwargs"
> that accept a mapping and will be passed as **kwargs to the underlying
> created Task.
>
> I insist on the fact that the 2 proposals are independent, so please
> don't reject both if you don't like one or the other. Passing a
> parameter to the underlying custom Task is still of value even without
> the unscheduled instantiation, and vice versa.
>
> Also, if somebody has any idea on how to make a LazyTask that we can
> await on without blocking everything, I'll take it.
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-13 Thread Nathaniel Smith
How about:

async def wait_to_run(async_fn, *args):
await wait_for_something()
return await async_fn(*args)

task = loop.create_task(wait_to_run(myfunc, ...))

-

Whatever strategy you use, you should also think about what semantics you
want if one of these delayed tasks is cancelled before it starts.

For regular, non-delayed tasks, Trio makes sure that even if it gets
cancelled before it starts, then it still gets scheduled and runs until the
first cancellation point. This is necessary for correct resource hand-off
between tasks:

async def some_task(handle):
with handle:
await ...

If we skipped running this task entirely, then the handle wouldn't be
closed properly; scheduling it once allows the with block to run, and then
get cleaned up by the cancellation exception. I'm not sure but I think
asyncio handles pre-cancellation in a similar way. (Yury, do you know?)

Now, in delayed task case, there's a similar issue. If you want to keep the
same solution, then you might want to instead write:

# asyncio
async def wait_to_run(async_fn, *args):
try:
await wait_for_something()
except asyncio.CancelledError:
# have to create a subtask to make it cancellable
subtask = loop.create_task(async_fn(*args))
# then cancel it immediately
subtask.cancel()
# and wait for the cancellation to be processed
return await subtask
else:
return await async_fn(*args)

In trio, this could be simplified to

# trio
async def wait_to_run(async_fn, *args):
try:
await wait_for_something()
except trio.Cancelled:
pass
return await async_fn(*args)

(This works because of trio's "stateful cancellation" – if the whole thing
is cancelled, then as soon as async_fn hits a cancellation point the
exception will be re-delivered.)

-n

On Wed, Jun 13, 2018, 13:47 Michel Desmoulin 
wrote:

> I was working on a concurrency limiting code for asyncio, so the user
> may submit as many tasks as one wants, but only a max number of tasks
> will be submitted to the event loop at the same time.
>
> However, I wanted that passing an awaitable would always return a task,
> no matter if the task was currently scheduled or not. The goal is that
> you could add done callbacks to it, decide to force schedule it, etc
>
> I dug in the asyncio.Task code, and encountered:
>
> def __init__(self, coro, *, loop=None):
> ...
> self._loop.call_soon(self._step)
> self.__class__._all_tasks.add(self)
>
> I was surprised to see that instantiating a Task class has any side
> effect at all, let alone 2, and one of them being to be immediately
> scheduled for execution.
>
> I couldn't find a clean way to do what I wanted: either you
> loop.create_task() and you get a task but it runs, or you don't run
> anything, but you don't get a nice task object to hold on to.
>
> I tried several alternatives, like returning a future, and binding the
> future awaiting to the submission of a task, but that was complicated
> code that duplicated a lot of things.
>
> I tried creating a custom task, but it was even harder, setting a custom
> event policy, to provide a custom event loop with my own create_task()
> accepting parameters. That's a lot to do just to provide a parameter to
> Task, especially if you already use a custom event loop (e.g: uvloop). I
> was expecting to have to create a task factory only, but task factories
> can't get any additional parameters from create_task()).
>
> Additionally I can't use ensure_future(), as it doesn't allow to pass
> any parameter to the underlying Task, so if I want to accept any
> awaitable in my signature, I need to provide my own custom ensure_future().
>
> All those implementations access a lot of _private_api, and do other
> shady things that linters hate; plus they are fragile at best. What's
> more, Task being rewritten in C prevents things like setting self._coro,
> so we can only inherit from the pure Python slow version.
>
> In the end, I can't even await the lazy task, because it blocks the
> entire program.
>
> Hence I have 2 distinct, but independent albeit related, proposals:
>
> - Allow Task to be created but not scheduled for execution, and add a
> parameter to ensure_future() and create_task() to control this. Awaiting
> such a task would just do like asyncio.sleep(O) until it is scheduled
> for execution.
>
> - Add an parameter to ensure_future() and create_task() named "kwargs"
> that accept a mapping and will be passed as **kwargs to the underlying
> created Task.
>
> I insist on the fact that the 2 proposals are independent, so please
> don't reject both if you don't like one or the other. Passing a
> parameter to the underlying custom Task is still of value even without
> the unscheduled instantiation, and vice versa.
>
> Also, if somebody has any idea on how to make a LazyTask that we can
> await on without blocking everything, I'll take it.
>
>
>
> 

Re: [Python-Dev] A more flexible task creation

2018-06-13 Thread Yury Selivanov
On Wed, Jun 13, 2018 at 4:47 PM Michel Desmoulin
 wrote:
>
> I was working on a concurrency limiting code for asyncio, so the user
> may submit as many tasks as one wants, but only a max number of tasks
> will be submitted to the event loop at the same time.

What does that "concurrency limiting code" do?  What problem does it solve?

>
> However, I wanted that passing an awaitable would always return a task,
> no matter if the task was currently scheduled or not. The goal is that
> you could add done callbacks to it, decide to force schedule it, etc

The obvious advice is to create a new class "DelayedTask" with a
Future-like API.  You can then schedule the real awaitable that it
wraps with `loop.create_task` at any point.  Providing
"add_done_callback"-like API is trivial.  DelayedTask can itself be an
awaitable, scheduling itself on a first __await__ call.

As a benefit, your implementation will support any Task-like objects
that alternative asyncio loops can implement. No need to mess with
policies either.

>
> I dug in the asyncio.Task code, and encountered:
>
> def __init__(self, coro, *, loop=None):
> ...
> self._loop.call_soon(self._step)
> self.__class__._all_tasks.add(self)
>
> I was surprised to see that instantiating a Task class has any side
> effect at all, let alone 2, and one of them being to be immediately
> scheduled for execution.

To be fair, implicitly scheduling a task for execution is what all
async frameworks (twisted, curio, trio) do when you wrap a coroutine
into a task.  I don't recall them having a keyword argument to control
when the task is scheduled.

>
> I couldn't find a clean way to do what I wanted: either you
> loop.create_task() and you get a task but it runs, or you don't run
> anything, but you don't get a nice task object to hold on to.

A clean way is to create a new layer of abstraction (e.g. DelayedTask
I suggested above).

[..]
> I tried creating a custom task, but it was even harder, setting a custom
> event policy, to provide a custom event loop with my own create_task()
> accepting parameters. That's a lot to do just to provide a parameter to
> Task, especially if you already use a custom event loop (e.g: uvloop). I
> was expecting to have to create a task factory only, but task factories
> can't get any additional parameters from create_task()).

I don't think creating a new Task implementation is needed here, a
simple wrapper should work just fine.

[..]
> Hence I have 2 distinct, but independent albeit related, proposals:
>
> - Allow Task to be created but not scheduled for execution, and add a
> parameter to ensure_future() and create_task() to control this. Awaiting
> such a task would just do like asyncio.sleep(O) until it is scheduled
> for execution.
>
> - Add an parameter to ensure_future() and create_task() named "kwargs"
> that accept a mapping and will be passed as **kwargs to the underlying
> created Task.
>
> I insist on the fact that the 2 proposals are independent, so please
> don't reject both if you don't like one or the other. Passing a
> parameter to the underlying custom Task is still of value even without
> the unscheduled instantiation, and vice versa.

Well, to add a 'kwargs' parameter to ensure_future() we need kwargs in
Task.__init__.  So far we only have 'loop' and it's not something that
ensure_future() should allow you to override.  So unless we implement
the first proposal, we don't need the second.

Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] A more flexible task creation

2018-06-13 Thread Michel Desmoulin
I was working on a concurrency limiting code for asyncio, so the user
may submit as many tasks as one wants, but only a max number of tasks
will be submitted to the event loop at the same time.

However, I wanted that passing an awaitable would always return a task,
no matter if the task was currently scheduled or not. The goal is that
you could add done callbacks to it, decide to force schedule it, etc

I dug in the asyncio.Task code, and encountered:

def __init__(self, coro, *, loop=None):
...
self._loop.call_soon(self._step)
self.__class__._all_tasks.add(self)

I was surprised to see that instantiating a Task class has any side
effect at all, let alone 2, and one of them being to be immediately
scheduled for execution.

I couldn't find a clean way to do what I wanted: either you
loop.create_task() and you get a task but it runs, or you don't run
anything, but you don't get a nice task object to hold on to.

I tried several alternatives, like returning a future, and binding the
future awaiting to the submission of a task, but that was complicated
code that duplicated a lot of things.

I tried creating a custom task, but it was even harder, setting a custom
event policy, to provide a custom event loop with my own create_task()
accepting parameters. That's a lot to do just to provide a parameter to
Task, especially if you already use a custom event loop (e.g: uvloop). I
was expecting to have to create a task factory only, but task factories
can't get any additional parameters from create_task()).

Additionally I can't use ensure_future(), as it doesn't allow to pass
any parameter to the underlying Task, so if I want to accept any
awaitable in my signature, I need to provide my own custom ensure_future().

All those implementations access a lot of _private_api, and do other
shady things that linters hate; plus they are fragile at best. What's
more, Task being rewritten in C prevents things like setting self._coro,
so we can only inherit from the pure Python slow version.

In the end, I can't even await the lazy task, because it blocks the
entire program.

Hence I have 2 distinct, but independent albeit related, proposals:

- Allow Task to be created but not scheduled for execution, and add a
parameter to ensure_future() and create_task() to control this. Awaiting
such a task would just do like asyncio.sleep(O) until it is scheduled
for execution.

- Add an parameter to ensure_future() and create_task() named "kwargs"
that accept a mapping and will be passed as **kwargs to the underlying
created Task.

I insist on the fact that the 2 proposals are independent, so please
don't reject both if you don't like one or the other. Passing a
parameter to the underlying custom Task is still of value even without
the unscheduled instantiation, and vice versa.

Also, if somebody has any idea on how to make a LazyTask that we can
await on without blocking everything, I'll take it.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com