Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Nathaniel Smith
On Thu, Jun 14, 2018 at 3:31 PM, Tin Tvrtković  wrote:
> * my gut feeling is spawning a thousand tasks and having them all fighting
> over the same semaphore and scheduling is going to be much less efficient
> than a small number of tasks draining a queue.

Fundamentally, a Semaphore is a queue:

https://github.com/python/cpython/blob/9e7c92193cc98fd3c2d4751c87851460a33b9118/Lib/asyncio/locks.py#L437

...so the two approaches are more analogous than it might appear at
first. The big difference is what objects are in the queue. For a web
scraper, the options might be either a queue where each entry is a URL
represented as a str, versus a queue where each entry is (effectively)
a Task object with attached coroutine object.

So I think the main differences you'll see in practice are:

- a Task + coroutine aren't terribly big -- maybe a few kilobytes --
but definitely larger than a str; so the Semaphore approach will take
more RAM. Modern machines have lots of RAM, so for many use cases this
is still probably fine (50,000 tasks is really not that many). But
there will certainly be some situations where the str queue fits in
RAM but the Task queue doesn't.

- If you create all those Task objects up front, then that front-loads
a chunk of work (i.e., allocating all those objects!) that otherwise
would be spread throughout the queue processing. So you'll see a
noticeable pause up front before the code starts working.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python-Dev Digest, Vol 179, Issue 21

2018-06-14 Thread casanova yassine
The Jseries acknowlegement by using Jetty containers can get you a best 
resolution To python wheel asynchronism bugs
Envoyé à partir d’un Smarpthone Android avec GMX Mail.

Le 14/06/2018, 4:00 PM python-dev-requ...@python.org a écrit:


On 13 Jun 2018, at 15:42, Nick Coghlan 
mailto:ncogh...@gmail.com>> wrote:

On 13 June 2018 at 02:23, Guido van Rossum 
mailto:gu...@python.org>> wrote:
So, to summarize, we need something like six for C?

Yeah, pretty much - once we can get to the point where it's routine for folks 
to be building "abiX" or "abiXY" wheels (with the latter not actually being a 
defined compatibility tag yet, but having the meaning of "targets the stable 
ABI as first defined in CPython X.Y"), rather than feature release specific 
"cpXYm" ones, then a *lot* of the extension module maintenance pain otherwise 
arising from more frequent CPython releases should be avoided.

There'd still be a lot of other details to work out to turn the proposed 
release cadence change into a practical reality, but this is the key piece that 
I think is a primarily technical hurdle: simplifying the current 
"wheel-per-python-version-per-target-platform" community project build matrices 
to instead be "wheel-per-target-platform”.

This requires getting people to mostly stop using the non-stable ABI, and that 
could be a lot of work for projects that have existing C extensions that don’t 
use the stable ABI or cython/cffi/…

That said, the CPython API tends to be fairly stable over releases and even 
without using the stable ABI supporting faster CPython feature releases 
shouldn’t be too onerous, especially for projects with some kind of automation 
for creating release artefacts (such as a CI system).

Ronald

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Tin Tvrtković
On Thu, Jun 14, 2018 at 10:03 PM Steve Dower  wrote:

> I often use
> semaphores for this when I need it, and it looks like
> asyncio.Semaphore() is sufficient for this:
>
>
> import asyncio
> task_limiter = asyncio.Semaphore(4)
>
> async def my_task():
>  await task_limiter.acquire()
>  try:
>  await do_db_request()
>  finally:
>  task_limiter.release()


Yeah, a semaphore logically fits exactly but

* I feel this API is somewhat clunky, even if you use an 'async with'.

* my gut feeling is spawning a thousand tasks and having them all fighting
over the same semaphore and scheduling is going to be much less efficient
than a small number of tasks draining a queue.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Tin Tvrtković
Other folks have already chimed in, so I'll be to the point. Try writing a
simple asyncio web scraper (using maybe the aiohttp library) and create
5000 tasks for scraping different sites. My prediction is a whole lot of
them will time out due to various reasons.

Other responses inline.

On Thu, Jun 14, 2018 at 9:15 PM Chris Barker  wrote:

> async is not parallel -- all the tasks will be run in the same thread
> (Unless you explicitly spawn another thread), and only one task is running
> at once, and the task switching happens when the task specifically releases
> itself.


asyncio is mostly used for IO-heavy workloads (note the name). If you're
doing IO in asyncio, it is most definitely parallel. The point of it is
having a large number of open network connections at the same time.


> So why do queries fail with 1 tasks? or ANY number? If the async DB
> access code is written right, a given query should not "await" unless it is
> in a safe state to do so.
>

Imagine you have a batch job you need to do. You need to fetch a million
records from your database, and you can't use a query to get them all - you
need a million individual "get" requests. Even if Python was infinitely
fast, and your bandwidth was infinite, can your database handle opening a
million new connections in parallel, in a very short time? Mine sure can't,
even a few hundred extra connections would be a potential problem. So you
want to do the work in chunks, but still not one by one.


> and threads aren't synchronous -- but they are concurrent.
>

Using threads implies coupling threads with IO. Doing requests one at a
time in a given thread. Generally called 'synchronous IO', as opposed to
asynchronous IO/asyncio.


>  because threads ARE concurrent, and there is no advantage to having more
> threads than can actually run at once, and having many more does cause
> thread-switching performance issues.
>

Weeell technically threads in CPython aren't really concurrent (when
running Python bytecode), but for doing IO they are in practice. When doing
IO, there absolutely is an advantage to using more threads than can run at
once (in CPython only one thread running Python can run at once). You can
test it out yourself by writing a synchronous web scraper (using maybe the
requests library) and trying to scrape using a threadpool vs using a single
thread. You'll find the threadpool version is much faster.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Steve Dower

On 14Jun2018 1214, Chris Barker via Python-Dev wrote:
Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying 
to understand the problem here.


But if I have this right:

I've been using asyncio a lot lately and have encountered this
problem several times. Imagine you want to do a lot of queries
against a database, spawning 1 tasks in parallel will probably
cause a lot of them to fail.


async is not parallel -- all the tasks will be run in the same thread 
(Unless you explicitly spawn another thread), and only one task is 
running at once, and the task switching happens when the task 
specifically releases itself.


If the task isn't actually doing the work, but merely waiting for it to 
finish, then you can end up overloading the thing that *is* doing the 
task (e.g. the network interface, database server, other thread/process, 
file system, etc.).


Single-threaded async is actually all about *waiting* - it provides a 
convenient model to do other tasks while you are waiting for the first 
(as well as a convenient model to indicate what should be done after it 
completes - there are two conveniences here).


If the underlying thing you're doing *can* run in parallel, but becomes 
less efficient the more times you do it (for example, most file system 
operations fall into this category), you will want to limit how many 
tasks you *start*, not just how many you are waiting for. I often use 
semaphores for this when I need it, and it looks like 
asyncio.Semaphore() is sufficient for this:



import asyncio
task_limiter = asyncio.Semaphore(4)

async def my_task():
await task_limiter.acquire()
try:
await do_db_request()
finally:
task_limiter.release()


Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Joni Orponen
On Thu, Jun 14, 2018 at 9:17 PM Chris Barker via Python-Dev <
python-dev@python.org> wrote:

> Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to
> understand the problem here.
>

Vocabulary-wise 'queue depth' might be a suitable mental aid for what
people actually want to limit. The practical issue is most likely something
to do with hitting timeouts when trying to queue too much work onto a
service.

-- 
Joni Orponen
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Chris Barker via Python-Dev
Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to
understand the problem here.

But if I have this right:

I've been using asyncio a lot lately and have encountered this problem
> several times. Imagine you want to do a lot of queries against a database,
> spawning 1 tasks in parallel will probably cause a lot of them to fail.
>

async is not parallel -- all the tasks will be run in the same thread
(Unless you explicitly spawn another thread), and only one task is running
at once, and the task switching happens when the task specifically releases
itself.

If it matters in what order the tasks are performed, then you should not be
using async.

So why do queries fail with 1 tasks? or ANY number? If the async DB
access code is written right, a given query should not "await" unless it is
in a safe state to do so.

So what am I missing here???

What you need in a task pool of sorts, to limit concurrency and do only 20
> requests in parallel.
>

still wrapping my head around the vocabulary, but async is not concurrent.

If we were doing this synchronously, we wouldn't spawn 1 threads using
> 1 connections,
>

and threads aren't synchronous -- but they are concurrent.


> we would use a thread pool with a limited number of threads and submit the
> jobs into its queue.
>

because threads ARE concurrent, and there is no advantage to having more
threads than can actually run at once, and having many more does cause
thread-switching performance issues.

To me, tasks are (somewhat) logically analogous to threads.
>

kinda -- in the sense that they are run (and completed) in arbitrary order,
But they are different, and that difference is key to this issue.

As Yury expressed interest in this idea, there must be something I'm
missing.

What is it?

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: reduce GC threshold in development mode (-X dev)

2018-06-14 Thread Antoine Pitrou
On Fri, 8 Jun 2018 09:48:03 +0200
Victor Stinner  wrote:
> 
> Question: Do you think that bugs spotted by a GC collection are common
> enough to change the GC thresholds in development mode (new -X dev
> flag of Python 3.7)?

I don't think replacing a more-or-less arbitrary value with another
more-or-less arbitrary value is a very useful change.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Gustavo Carneiro
On Thu, 14 Jun 2018 at 17:40, Tin Tvrtković  wrote:

> Hi,
>
> I've been using asyncio a lot lately and have encountered this problem
> several times. Imagine you want to do a lot of queries against a database,
> spawning 1 tasks in parallel will probably cause a lot of them to fail.
> What you need in a task pool of sorts, to limit concurrency and do only 20
> requests in parallel.
>
> If we were doing this synchronously, we wouldn't spawn 1 threads using
> 1 connections, we would use a thread pool with a limited number of
> threads and submit the jobs into its queue.
>
> To me, tasks are (somewhat) logically analogous to threads. The solution
> that first comes to mind is to create an AsyncioTaskExecutor with a
> submit(coro, *args, **kwargs) method. Put a reference to the coroutine and
> its arguments into an asyncio queue. Spawn n tasks pulling from this queue
> and awaiting the coroutines.
>

> It'd probably be useful to have this in the stdlib at some point.
>

Probably a good idea, yes, because it seems a rather common use case.

OTOH, I did something similar but for a different use case.  In my case, I
have a Watchdog class, that takes a list of (coro, *args, **kwargs).  What
it does is ensure there is always a task for each of the co-routines
running, and watches the tasks, if they crash they are automatically
restarted (with logging).  Then there is a stop() method to cancel the
watchdog-managed tasks and await them.  My use case is because I tend to
write a lot of singleton-style objects, which need book keeping tasks, or
redis pubsub listening tasks, and my primary concern is not starting lots
of tasks, it is that the few tasks I have must be restarted if they crash,
forever.

This is why I think it's not that hard to write "sugar" APIs on top of
asyncio, and everyone's needs will be different.

The strict API compatibility requirements of core Python stdlib, coupled
with the very long feature release life-cycles of Python, make me think
this sort of thing perhaps is better built in an utility library on top of
asyncio, rather than inside asyncio itself?  18 months is a long long time
to iterate on these features.  I can't wait for Python 3.8...


>
> Date: Wed, 13 Jun 2018 22:45:22 +0200
>> From: Michel Desmoulin 
>> To: python-dev@python.org
>> Subject: [Python-Dev] A more flexible task creation
>> Message-ID: 
>> Content-Type: text/plain; charset=utf-8
>>
>> I was working on a concurrency limiting code for asyncio, so the user
>> may submit as many tasks as one wants, but only a max number of tasks
>> will be submitted to the event loop at the same time.
>>
>> However, I wanted that passing an awaitable would always return a task,
>> no matter if the task was currently scheduled or not. The goal is that
>> you could add done callbacks to it, decide to force schedule it, etc
>>
>> I dug in the asyncio.Task code, and encountered:
>>
>> def __init__(self, coro, *, loop=None):
>> ...
>> self._loop.call_soon(self._step)
>> self.__class__._all_tasks.add(self)
>>
>> I was surprised to see that instantiating a Task class has any side
>> effect at all, let alone 2, and one of them being to be immediately
>> scheduled for execution.
>>
>> I couldn't find a clean way to do what I wanted: either you
>> loop.create_task() and you get a task but it runs, or you don't run
>> anything, but you don't get a nice task object to hold on to.
>>
>> I tried several alternatives, like returning a future, and binding the
>> future awaiting to the submission of a task, but that was complicated
>> code that duplicated a lot of things.
>>
>> I tried creating a custom task, but it was even harder, setting a custom
>> event policy, to provide a custom event loop with my own create_task()
>> accepting parameters. That's a lot to do just to provide a parameter to
>> Task, especially if you already use a custom event loop (e.g: uvloop). I
>> was expecting to have to create a task factory only, but task factories
>> can't get any additional parameters from create_task()).
>>
>> Additionally I can't use ensure_future(), as it doesn't allow to pass
>> any parameter to the underlying Task, so if I want to accept any
>> awaitable in my signature, I need to provide my own custom
>> ensure_future().
>>
>> All those implementations access a lot of _private_api, and do other
>> shady things that linters hate; plus they are fragile at best. What's
>> more, Task being rewritten in C prevents things like setting self._coro,
>> so we can only inherit from the pure Python slow version.
>>
>> In the end, I can't even await the lazy task, because it blocks the
>> entire program.
>>
>> Hence I have 2 distinct, but independent albeit related, proposals:
>>
>> - Allow Task to be created but not scheduled for execution, and add a
>> parameter to ensure_future() and create_task() to control this. Awaiting
>> such a task would just do like asyncio.sleep(O) until it is scheduled
>> for execution.
>>
>> - 

Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Yury Selivanov
On Thu, Jun 14, 2018 at 12:40 PM Tin Tvrtković  wrote:
>
> Hi,
>
> I've been using asyncio a lot lately and have encountered this problem 
> several times. Imagine you want to do a lot of queries against a database, 
> spawning 1 tasks in parallel will probably cause a lot of them to fail. 
> What you need in a task pool of sorts, to limit concurrency and do only 20 
> requests in parallel.
>
> If we were doing this synchronously, we wouldn't spawn 1 threads using 
> 1 connections, we would use a thread pool with a limited number of 
> threads and submit the jobs into its queue.
>
> To me, tasks are (somewhat) logically analogous to threads. The solution that 
> first comes to mind is to create an AsyncioTaskExecutor with a submit(coro, 
> *args, **kwargs) method. Put a reference to the coroutine and its arguments 
> into an asyncio queue. Spawn n tasks pulling from this queue and awaiting the 
> coroutines.
>
> It'd probably be useful to have this in the stdlib at some point.

Sounds like a good idea!  Feel free to open an issue to prototype the API.

Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A more flexible task creation

2018-06-14 Thread Tin Tvrtković
Hi,

I've been using asyncio a lot lately and have encountered this problem
several times. Imagine you want to do a lot of queries against a database,
spawning 1 tasks in parallel will probably cause a lot of them to fail.
What you need in a task pool of sorts, to limit concurrency and do only 20
requests in parallel.

If we were doing this synchronously, we wouldn't spawn 1 threads using
1 connections, we would use a thread pool with a limited number of
threads and submit the jobs into its queue.

To me, tasks are (somewhat) logically analogous to threads. The solution
that first comes to mind is to create an AsyncioTaskExecutor with a
submit(coro, *args, **kwargs) method. Put a reference to the coroutine and
its arguments into an asyncio queue. Spawn n tasks pulling from this queue
and awaiting the coroutines.

It'd probably be useful to have this in the stdlib at some point.

Date: Wed, 13 Jun 2018 22:45:22 +0200
> From: Michel Desmoulin 
> To: python-dev@python.org
> Subject: [Python-Dev] A more flexible task creation
> Message-ID: 
> Content-Type: text/plain; charset=utf-8
>
> I was working on a concurrency limiting code for asyncio, so the user
> may submit as many tasks as one wants, but only a max number of tasks
> will be submitted to the event loop at the same time.
>
> However, I wanted that passing an awaitable would always return a task,
> no matter if the task was currently scheduled or not. The goal is that
> you could add done callbacks to it, decide to force schedule it, etc
>
> I dug in the asyncio.Task code, and encountered:
>
> def __init__(self, coro, *, loop=None):
> ...
> self._loop.call_soon(self._step)
> self.__class__._all_tasks.add(self)
>
> I was surprised to see that instantiating a Task class has any side
> effect at all, let alone 2, and one of them being to be immediately
> scheduled for execution.
>
> I couldn't find a clean way to do what I wanted: either you
> loop.create_task() and you get a task but it runs, or you don't run
> anything, but you don't get a nice task object to hold on to.
>
> I tried several alternatives, like returning a future, and binding the
> future awaiting to the submission of a task, but that was complicated
> code that duplicated a lot of things.
>
> I tried creating a custom task, but it was even harder, setting a custom
> event policy, to provide a custom event loop with my own create_task()
> accepting parameters. That's a lot to do just to provide a parameter to
> Task, especially if you already use a custom event loop (e.g: uvloop). I
> was expecting to have to create a task factory only, but task factories
> can't get any additional parameters from create_task()).
>
> Additionally I can't use ensure_future(), as it doesn't allow to pass
> any parameter to the underlying Task, so if I want to accept any
> awaitable in my signature, I need to provide my own custom ensure_future().
>
> All those implementations access a lot of _private_api, and do other
> shady things that linters hate; plus they are fragile at best. What's
> more, Task being rewritten in C prevents things like setting self._coro,
> so we can only inherit from the pure Python slow version.
>
> In the end, I can't even await the lazy task, because it blocks the
> entire program.
>
> Hence I have 2 distinct, but independent albeit related, proposals:
>
> - Allow Task to be created but not scheduled for execution, and add a
> parameter to ensure_future() and create_task() to control this. Awaiting
> such a task would just do like asyncio.sleep(O) until it is scheduled
> for execution.
>
> - Add an parameter to ensure_future() and create_task() named "kwargs"
> that accept a mapping and will be passed as **kwargs to the underlying
> created Task.
>
> I insist on the fact that the 2 proposals are independent, so please
> don't reject both if you don't like one or the other. Passing a
> parameter to the underlying custom Task is still of value even without
> the unscheduled instantiation, and vice versa.
>
> Also, if somebody has any idea on how to make a LazyTask that we can
> await on without blocking everything, I'll take it.
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com