Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-23 Thread MRAB

On 2018-10-23 06:13, Nathaniel Smith wrote:

On Sun, Oct 21, 2018 at 8:31 PM, Guido van Rossum  wrote:

On Sun, Oct 21, 2018 at 6:08 PM Nathaniel Smith  wrote:

I'm not sure if this is an issue the way Queue is used in practice, but in
general you have to be careful with this kind of circular flow because if
your queue communicates backpressure (which it should) then circular flows
can deadlock.


Nathaniel, would you be able to elaborate more on the issue of backpressure?
I think a lot of people here are not really familiar with the concepts and
its importance, and it changes how you have to think about queues and the
like.


Sure.

Suppose you have some kind of producer connected to some kind of
consumer. If the producer consistently runs faster than the consumer,
what should happen? By default with queue.Queue, there's no limit on
its internal buffer, so if the producer puts, say, 10 items per
second, and the consumer only gets, say, 1 item per second, then the
internal buffer grows by 9 items per second. Basically you have a
memory leak, which will eventually crash your program. And well before
that, your latency will become terrible. How can we avoid this?


[snip]
The purpose of the sentinel is to tell the consumer(s) that there are no 
more items, that the producer has finished producing. The sentinel is 
the only item in the queue, there will be no more items after it, and 
backpressure is not an issue.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-22 Thread Nathaniel Smith
On Sun, Oct 21, 2018 at 8:31 PM, Guido van Rossum  wrote:
> On Sun, Oct 21, 2018 at 6:08 PM Nathaniel Smith  wrote:
>> I'm not sure if this is an issue the way Queue is used in practice, but in
>> general you have to be careful with this kind of circular flow because if
>> your queue communicates backpressure (which it should) then circular flows
>> can deadlock.
>
> Nathaniel, would you be able to elaborate more on the issue of backpressure?
> I think a lot of people here are not really familiar with the concepts and
> its importance, and it changes how you have to think about queues and the
> like.

Sure.

Suppose you have some kind of producer connected to some kind of
consumer. If the producer consistently runs faster than the consumer,
what should happen? By default with queue.Queue, there's no limit on
its internal buffer, so if the producer puts, say, 10 items per
second, and the consumer only gets, say, 1 item per second, then the
internal buffer grows by 9 items per second. Basically you have a
memory leak, which will eventually crash your program. And well before
that, your latency will become terrible. How can we avoid this?

I guess we could avoid this by carefully engineering our systems to
make sure that producers always run slower than consumers, but that's
difficult and fragile. Instead, what we usually want to do is to
dynamically detect when a producer is outrunning a consumer, and apply
*backpressure*. (It's called that b/c it involves the consumer
"pushing back" against the producer.) The simplest way is to put a
limit on how large our Queue's buffer can grow, and make put() block
if it would exceed this limit. That way producers are automatically
slowed down, because they have to wait for the consumer to drain the
buffer before they can continue executing.

This simple approach also works well when you have several tasks
arranged in a pipeline like A -> B -> C, where B gets objects from A,
does some processing, and then puts new items on to C. If C is running
slow, this will eventually apply backpressure to B, which will block
in put(), and then since B is blocked and not calling get(), then A
will eventually get backpressure too. In fact, this works fine for any
acyclic network topology.

If you have a cycle though, like A -> B -> C -> A, then you at least
potentially have the risk of deadlock, where every task is blocked in
put(), and can't continue until the downstream task calls get(), but
it never will because it's blocked in put() too. Sometimes it's OK and
won't deadlock, but you need to think carefully about the details to
figure that out.

If a task gets and puts to the same queue, like someone suggested
doing for the sentinel value upthread, then that's a cycle and you
need to do some more analysis. (I guess if you have a single sentinel
value, then queue.Queue is probably OK, since the minimal buffer size
it supports is 1? So when the last thread get()s the sentinel, it
knows that there's at least 1 free space in the buffer, and can put()
it back without blocking. But if there's a risk of somehow getting
multiple sentinel values, or if Queues ever gain support for
zero-sized buffers, then this pattern could deadlock.)

There's a runnable example here:
https://trio.readthedocs.io/en/latest/reference-core.html#buffering-in-channels
And I also wrote about backpressure and asyncio here:
https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/#bug-1-backpressure

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-22 Thread Vladimir Filipović
Nathaniel, thank you for the pointer to Trio.
Its approach seems very robust. I'm relieved to see that a solution so
fundamentally rebuilt has also settled on very similar semantics for
its `.close_put()`.

I think your `.clone()` idiom is very clear when the communication
objects are treated as distinct endpoints. Something with similar
effects on closing (not necessarily similar in idiom) would probably
be a neat enhancement to the standard Queue, though if I was making
that, I'd do it in an external package.

--

Antoine, regarding multiprocessing.Queue:

The similarity of meaning behind closing that I was getting at is that
mp.Q.close() means "I am done writing to this queue, and I don't care
about the rest of you", whereas the proposed meaning of q.Q.close() is
"Listen up, we are all done writing to this queue". I don't know yet
that this difference necessarily creates a true incompatibility.

That the effects (in terms of eager OS-resource cleanup) are different
shouldn't be a problem in itself - every implementation does the right
thing for itself.

--

On Mon, Oct 22, 2018 at 2:03 AM Terry Reedy  wrote:
> The proposed close method would only half-close the queue: closed to
> puts, open to gets (but perhaps close completely when the last item is
> gotten.

In other words: in this proposal, there is no such thing as "closed
for retrieval". A closed queue means exactly that it's closed for
insertion.

Retrieval becomes permanently impossible once the queue is closed and
exhausted, and that's a condition that get() must treat correctly and
usefully, but calling that condition "closed / completely closed /
closed for retrieval" would muddle up the terminology.

In the proposed implementation I've called it "exhausted", a name I've
picked up god-knows-when and from god-knows-where, but it seemed
reasonable.

--

Regarding sentinels in general: They are a limited-purpose solution,
and this proposal should make them unnecessary in 99% of the cases.

Firstly, they only naturally apply to FIFO queues. You could hack your
use of LIFO and priority queues to also rely on sentinels, but it's
very kludgey in the general cases, not a natural fit, and not
generalizable to user-created children of Queue (which Queue otherwise
explicitly aspires to support).

Secondly, they only work when the producer is the one driving the flow
and notifying the consumer that "no more is forthcoming". They don't
work when the producer is the one who needs to be notified.

Thirdly, they're a potential cause of deadlocks when the same threads
act as both producers and consumers. (E.g. in a parallelized
breadth-first-search.) I'm sure this is the circular flow that
Nathaniel was referring to, but I'll let him detail it more or correct
me.

Fourthly, they don't make it easy to query the Queue about whether
it's closed. This probably isn't a big deal admittedly.

Sure, when sentinels are adequate, they're adequate. This proposal
aims to be more general-purpose than that.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread Guido van Rossum
On Sun, Oct 21, 2018 at 6:08 PM Nathaniel Smith  wrote:
I'm not sure if this is an issue the way Queue is used in practice, but in
general you have to be careful with this kind of circular flow because if
your queue communicates backpressure (which it should) then circular flows
can deadlock.

Nathaniel, would you be able to elaborate more on the issue of
backpressure? I think a lot of people here are not really familiar with the
concepts and its importance, and it changes how you have to think about
queues and the like.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread Cameron Simpson

On 21Oct2018 18:06, Nathaniel Smith  wrote:

On Sun, Oct 21, 2018, 16:48 MRAB  wrote:

On 2018-10-21 22:30, Antoine Pitrou wrote:
> Ah.  This is the one statement that makes me favorable to this 
> idea.

> When there is a single consumer, it's easy enough to send a sentinel.
> But when there are multiple consumers, suddenly you must send exactly
> the right number of sentinels (which means you also have to careful
> keep track of their number, which isn't always easy).  There's some
> delicate code doing exactly that in concurrent.futures.
>
You don't need more than one sentinel. When a consumer sees the
sentinel, it just needs to put it back for the other consumers.


Yes, this is exactly what my own IterableQUeue does.


I'm not sure if this is an issue the way Queue is used in practice, but in
general you have to be careful with this kind of circular flow because if
your queue communicates backpressure (which it should) then circular flows
can deadlock.


Haven't come across this myself. A closeable queue doesn't seem circular 
to me. The handling of the sentinel is internal to the IterableQueue, so 
external users never see it.


Cheers,
Cameron Simpson 
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread Cameron Simpson

On 21Oct2018 21:19, Vladimir Filipović  wrote:

On Sun, Oct 21, 2018 at 8:45 PM MRAB  wrote:

FTR, this has been discussed before:
[Python-ideas] `__iter__` for queues?
https://mail.python.org/pipermail/python-ideas/2010-January/006711.html


Thank you!


Hmm, yes. My post there is this one:

 https://mail.python.org/pipermail/python-ideas/2010-January/006716.html

I want to point out that in my code the single consumer of a Queue is 
incredibly common, so common that I can't off the top of my head think 
of _any_ uses of Queue directly: I _always_ make an IterableQueue and 
simply have the consumer iterate over the iterable queue.


This is _exactly_ like Vladimir's proposal to my mind: my IterableQueue 
is iterable, and has a .close method just like his (prevent further puts 
and indicates end of stream) mediated with a sentinel internally.  
Arbitrary number of putters, _usually_ only one getter but of course 
there's no need for that.


So to my mind his proposal is very simple and sensible, and matches 
almost universally my own use of Queues.


Cheers,
Cameron Simpson 
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread Nathaniel Smith
On Sun, Oct 21, 2018, 16:48 MRAB  wrote:

> On 2018-10-21 22:30, Antoine Pitrou wrote:
> > On Sun, 21 Oct 2018 19:58:05 +0200
> > Vladimir Filipović 
> > wrote:
> >>
> >> To anticipate a couple more possible questions:
> >>
> >> - What would this proposal do about multiple producers/consumers
> >> needing to jointly decide _when_ to close the queue?
> >>
> >> Explicitly nothing.
> >>
> >> The queue's state is either closed or not, and it doesn't care who
> >> closed it. It needs to interact correctly with multiple consumers and
> >> multiple producers, but once any one piece of code closes it, the
> >> correct interaction is acting like a closed queue for everybody.
> >
> > Ah.  This is the one statement that makes me favorable to this idea.
> > When there is a single consumer, it's easy enough to send a sentinel.
> > But when there are multiple consumers, suddenly you must send exactly
> > the right number of sentinels (which means you also have to careful
> > keep track of their number, which isn't always easy).  There's some
> > delicate code doing exactly that in concurrent.futures.
> >
> You don't need more than one sentinel. When a consumer sees the
> sentinel, it just needs to put it back for the other consumers.
>

I'm not sure if this is an issue the way Queue is used in practice, but in
general you have to be careful with this kind of circular flow because if
your queue communicates backpressure (which it should) then circular flows
can deadlock.

-n

>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread MRAB

On 2018-10-21 22:30, Antoine Pitrou wrote:

On Sun, 21 Oct 2018 19:58:05 +0200
Vladimir Filipović 
wrote:


To anticipate a couple more possible questions:

- What would this proposal do about multiple producers/consumers
needing to jointly decide _when_ to close the queue?

Explicitly nothing.

The queue's state is either closed or not, and it doesn't care who
closed it. It needs to interact correctly with multiple consumers and
multiple producers, but once any one piece of code closes it, the
correct interaction is acting like a closed queue for everybody.


Ah.  This is the one statement that makes me favorable to this idea.
When there is a single consumer, it's easy enough to send a sentinel.
But when there are multiple consumers, suddenly you must send exactly
the right number of sentinels (which means you also have to careful
keep track of their number, which isn't always easy).  There's some
delicate code doing exactly that in concurrent.futures.

You don't need more than one sentinel. When a consumer sees the 
sentinel, it just needs to put it back for the other consumers.


[snip]
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread Antoine Pitrou
On Sun, 21 Oct 2018 19:58:05 +0200
Vladimir Filipović 
wrote:
> 
> To anticipate a couple more possible questions:
> 
> - What would this proposal do about multiple producers/consumers
> needing to jointly decide _when_ to close the queue?
> 
> Explicitly nothing.
> 
> The queue's state is either closed or not, and it doesn't care who
> closed it. It needs to interact correctly with multiple consumers and
> multiple producers, but once any one piece of code closes it, the
> correct interaction is acting like a closed queue for everybody.

Ah.  This is the one statement that makes me favorable to this idea.
When there is a single consumer, it's easy enough to send a sentinel.
But when there are multiple consumers, suddenly you must send exactly
the right number of sentinels (which means you also have to careful
keep track of their number, which isn't always easy).  There's some
delicate code doing exactly that in concurrent.futures.

> - Should multiprocessing.Queue do the same thing?
> 
> I think so, though I'm not proposing it.
> 
> It already has a close() method, whose meaning is very similar but not
> identical to (a subset of) the proposed threading.Queue.close's
> meaning (with resource-management effects not relevant to
> threading.Queue either way).

Not really similar, unfortunately.  mp.Queue.close() isn't a logical
thing, but releases the queue's internal resources.  It doesn't signal
consumers that the producers has finished with the queue.

Perhaps if you renamed close() to something else ("put_eof" or
"put_end" perhaps?) that would allow porting it to mp.Queue?

Regards

Antoine.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread Nathaniel Smith
Hi Vladimir,

It's great to see people revisiting these old stdlib tools. Closure
tracking is definitely a big point of awkwardness for Queues. In Trio we
started with a straight copy of threading.Queue, and this turned out to be
a major friction point for users. We just deprecated our version of Queue
and replaced it with a new design. Our new thing is probably more radical
than you want to get in the stdlib (we ended up splitting the object into
two pieces, a sender object and a receiver object), but you might find the
discussions interesting:

Manual:
https://trio.readthedocs.io/en/latest/reference-core.html#using-channels-to-pass-values-between-tasks

A more minimal proposal to add closure tracking to trio.Queue:
https://github.com/python-trio/trio/pull/573

Follow-up issue with design questions we're still thinking about (also
links to earlier design discussions):
https://github.com/python-trio/trio/issues/719

We only started shipping this last week, so we're still getting experience
with it.

-n

On Sun, Oct 21, 2018, 10:59 Vladimir Filipović  wrote:

> Hi!
>
> I originally submitted this as a pull request. Raymond Hettinger
> suggested it should be given a shakeout in python-ideas first.
>
> https://github.com/python/cpython/pull/10018
> https://bugs.python.org/issue35034
>
> --
>
> Briefly:
>
> Add a close() method to Queue, which should simplify many common uses
> of the class and reduce the space for some easy-to-make errors.
>
> Also add an __iter__() method which in conjunction with close() would
> further simplify some common use patterns.
>
> --
>
> At eye-watering length:
>
> Apologies in advance for the length of this message. This isn't a PEP
> in disguise, it's a proposal for a very small, simple and I dare
> imagine uncontroversial feature. I'm new to contributing to Python and
> after the BPO/github submission I didn't manage to come up with a
> better way to present it than this.
>
> The issue
>
> Code using threading.Queue often needs to coordinate a "work is
> finished as far as I care" state between the producing and consuming
> side. Not "work" in the task_done() sense of completion of processing
> of queue items, "work" in the simpler sense of just passing data
> through the queue.
>
> For example, a producer can be driving the communication by enqueuing
> e.g. names of files that need to be processed, and once it's enqueued
> the last filename, it can be useful to inform the consumers that no
> further names will be coming, so after they've retrieved what's
> in-flight currently, they don't need to bother waiting for any more.
> Alternatively, a consumer can be driving the communication, and may
> need to let the producers know "I'm not interested in any more, so you
> can stop wasting resources on producing and enqueuing them".
> Also, a third, coordinating component may need to let both sides know
> that "Our collective work here is done. Start wrapping it up y'all,
> but don't drop any items that are still in-flight."
>
> In practice it's probably the exception, not the rule, when any piece
> of code interacting with a Queue _doesn't_ have to either inform
> another component that its interest in transferring the data has
> ended, or watch for such information.
>
> In the most common case of producer letting consumers know that it's
> done, this is usually implemented (over and over again) with sentinel
> objects, which is at best needlessly verbose and at worst error-prone.
> A recipe for multiple consumers making sure nobody misses the sentinel
> is not complicated, but neither is it obvious the first time one needs
> to do it.
> When a generic sentinel (None or similar) isn't adequate, some
> component needs to create the sentinel object and communicate it to
> the others, which complicates code, and especially complicates
> interfaces between components that are not being developed together
> (e.g. if one of them is part of a library and expects the library-user
> code to talk to it through a Queue).
>
> In the less common cases where the producers are the ones being
> notified, there isn't even a typical solution - everything needs to be
> cooked up from scratch using synchronization primitives.
>
> --
>
> A solution
>
> Adding a close() method to the Queue that simply prohibits all further
> put()'s (with other methods acting appropriately when the queue is
> closed) would simplify a lot of this in a clean and safe way - for the
> most obvious example, multi-consumer code would not have to juggle
> sentinel objects.
>
> Adding a further __iter__() method (that would block as necessary, and
> stop its iteration once the queue is closed and exhausted) would
> especially simplify many unsophisticated consumers.
>
> This is a current fairly ordinary pattern:
>
> # Producer:
> while some_condition:
> q.put(generate_item())
> q.put(sentinel)
>
> # Consumer:
> while True:
> item = q.get()
> if item == sentinel:
> q.put(sentinel)
> break

Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread Vladimir Filipović
On Sun, Oct 21, 2018 at 8:45 PM MRAB  wrote:
> FTR, this has been discussed before:
>
> [Python-ideas] `__iter__` for queues?
> https://mail.python.org/pipermail/python-ideas/2010-January/006711.html

Thank you!

For the sake of clarity, I want to outline a few differences between
that discussion and my proposal:

1. Much of the discussion there seemed to implicitly limit itself to
consideration of FIFO queues. This proposal works cleanly for child
classes too, including any (API-compliant) user-written children.

2. Throughout that discussion, iteration is the A feature, and closing
is occasionally mentioned as a possible prerequisite. In this
proposal, the A feature is closing, which enables sensible iteration
(as a B feature) but is useful even if iteration isn't used.

3. There's naturally a lot of quick spitballing of various
mutually-incompatible ideas there, whereas this is one rounded
self-consistent proposal. Most of what I've come up with has already
been anticipated there but it's all mixed up textually.

4. This proposal sidesteps a lot of the doubts and difficulties by
just not using sentinels at all. Being closed is part of the queue's
state that can be queried at any time, and will affect put() calls
immediately, without waiting for a sentinel to float up to the front.
(With recognition that your (MRAB's) message towards that thread's end
already proposed the same approach.)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Add closing and iteration to threading.Queue

2018-10-21 Thread MRAB

On 2018-10-21 18:58, Vladimir Filipović wrote:

Hi!

I originally submitted this as a pull request. Raymond Hettinger
suggested it should be given a shakeout in python-ideas first.

https://github.com/python/cpython/pull/10018
https://bugs.python.org/issue35034


[snip]
FTR, this has been discussed before:

[Python-ideas] `__iter__` for queues?
https://mail.python.org/pipermail/python-ideas/2010-January/006711.html
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/