[issue43119] asyncio.Queue.put never yields if the queue is unbounded

Spencer Nelson Fri, 05 Feb 2021 11:52:32 -0800


Spencer Nelson <[email protected]> added the comment:


Josh,

> Making literally every await equivalent to:
> 
> await asyncio.sleep(0)
> 
> followed by the actual await (which is effectively what you're proposing when 
> you expect all await to be preemptible) means adding non-trivial overhead to 
> all async operations (asyncio is based on system calls of the 
> select/poll/epoll/kpoll variety, which add meaningful overhead when we're 
> talking about an operation that is otherwise equivalent to an extremely cheap 
> simple collections.deque.append call).

A few things:

First, I don't think I proposed that. I was simply saying that my expectations 
on behavior were incorrect, which points towards documentation.

Second, I don't think making a point "preemptible" is the same as actually 
executing a cooperative-style yield to the scheduler. I just expected that it 
would always be in the cards - that it would always be a potential point where 
I'd get scheduled away.

Third, I don't think that await asyncio.sleep(0) triggers a syscall, but I 
certainly could be mistaken. It looks to me like it is special-cased in 
asyncio, from my reading of the source. Again - could be wrong.

Fourth, I think that the idea of non-cooperative preempting scheduling is not 
nearly as bizarre as you make it sound. There's certainly plenty of prior art 
on preemptive schedulers out there. Go uses a sort of partial preemption at 
function call sites *because* it's a particularly efficient way to do things.

But anyway - I didn't really want to discuss this. As I said above, it's 
obviously a way way way bigger design discussion than my specific issue.


> It also breaks many reasonable uses of asyncio.wait and asyncio.as_completed, 
> where the caller can reasonably expect to be able to await the known-complete 
> tasks without being preempted (if you know the coroutine is actually done, it 
> could be quite surprising/problematic when you await it and get preempted, 
> potentially requiring synchronization that wouldn't be necessary otherwise).

I think this cuts both ways. Without reading the source code of asyncio.Queue, 
I don't see how it's possible to know whether its put method yields. Because of 
this, I tend to assume synchronization is necessary everywhere. The way I know 
for sure that a function call can complete without yielding is supposed to be 
that it isn't an `async` function, right? That's why asyncio.Queue.put_nowait 
exists and isn't asynchronous.

> In real life, if whatever you're feeding the queue with is infinite and 
> requires no awaiting to produce each value, you should probably just avoid 
> the queue and have the consumer consume the iterable directly.

The stuff I'm feeding the queue doesn't require awaiting, but I *wish* it did. 
It's just a case of not having the libraries for asynchronicity yet on the 
source side. I was hoping that the queue would let me pace my work in a way 
that would let me do more concurrent work.

> Or just apply a maximum size to the queue; since the source of data to put is 
> infinite and not-awaitable, there's no benefit to an unbounded queue, you may 
> as well use a bound roughly fitted to the number of consumers, because any 
> further items are just wasting memory well ahead of when it's needed.

The problem isn't really that put doesn't yield for unbounded queues - it's 
that put doesn't yield *unless the queue is full*. That means that, if I use a 
very high maximum size for the queue, I'll still spend a big chunk of time 
filling up the queue, and only then will consumers start doing work.

I could pick a small queue bound, but then I'm more likely to waste time doing 
nothing if consumers are slower than the producer - I'll sit there with a 
full-but-tiny queue. Work-units in the queue can take wildly different amounts 
of time, so consumers will often be briefly slow, so the producer races ahead - 
until it hits its tiny limit. But then new work units arrive, and so the 
consumers are fast again - and they're quickly starved for work because the 
producer didn't build a good backlog.

So, the problem still remains, if work takes an uncertain amount of time which 
would seem to be the common reason for using a queue in the first place.

> Point is, regular queue puts only block (and potentially release the GIL 
> early) when they're full or, as a necessary consequence of threading being 
> less predictable than asyncio, when there is contention on the lock 
> protecting the queue internals (which is usually resolved quickly); why would 
> asyncio queues go out of their way to block when they don't need to?

I think you have it backwards. asyncio.Queue.put *always* blocks other 
coroutines' execution for unbounded queues. Why do they always block? If I 
wanted that, I wouldn't use anything in asyncio.Queue. I'd just use a 
collections.deque.

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue43119>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43119] asyncio.Queue.put never yields if the queue is unbounded

Reply via email to