Hendrik van Rooyen wrote:
: "Roel Schroeven" <rschroev_nospam...@fastmail.fm> wrote:
Hendrik van Rooyen schreef:
I have always wondered why people do the one queue many getters thing.
Because IMO it's the simplest and most elegant solution.
That is fair enough...
Given that the stuff you pass is homogenous in that it will require a
similar amount of effort to process, is there not a case to be made
to have as many queues as consumers, and to round robin the work?
Could work if the processing time for each work unit is exactly the same
(otherwise one or more consumers will be idle part of the time), but in
most cases that is not guaranteed. A simple example is fetching data
over the network: even if the data size is always the same, there will
be differences because of network load variations.
If you use one queue, each consumer fetches a new work unit as soon it
has consumed the previous one. All consumers will be working as long as
there is work to do, without having to write any code to do the load
balancing.
With one queue for each consumer, you either have to assume that the
average processing time is the same (otherwise some consumers will be
idle at the end, while others are still busy processing work units), or
you need some clever code in the producer(s) or the driving code to
balance the loads. That's extra complexity for little or no benefit.
I like the simplicity of having one queue: the producer(s) put work
units on the queue with no concern which consumer will process them or
how many consumers there even are; likewise the consumer(s) don't know
and don't need to know where their work units come from. And the work
gets automatically distributed to whichever consumer has first finished
its previous work unit.
This is all true in the case of a job that starts, runs and finishes.
I am not so sure it applies to something that has a long life.
And if the stuff you pass around needs disparate effort to consume,
it seems to me that you can more easily balance the load by having
specialised consumers, instead of instances of one humungous "I can
eat anything" consumer.
If there is a semantic difference, maybe yes; but I think it makes no
sense to differentiate purely on the expected execution times.
The idea is basically that you have the code that classifies in one
place only, instead of running in all the instances of the consumer.
Feels better to me, somehow.
<snip>
If the classifying you're doing is just based on expected time to
consume the item, then I think your plan to use separate queues is
misguided.
If the consumers are interchangeable in their abilities, then feeding
them from a single queue is more efficient, both on average wait time,
worst-case wait time, and on consumer utilization, in nearly all
non-pathological scenarios. Think the line at the bank. There's a good
reason they now have a single line for multiple tellers. If you have
five tellers, and one of your transactions is really slow, the rest of
the line is only slowed down by 20%, rather than a few people being
slowed down by a substantial amount because they happen to be behind the
slowpoke. 30 years ago, they'd have individual lines, and I tried in
vain to explain queuing theory to the bank manager.
Having said that, notice that sometimes the consumers in a computer are
not independent. If you're running 20 threads with this model, on a
processor with only two cores, and if the tasks are CPU bound, you're
wasting lots of thread-management time without gaining anything.
Similarly, if there are other shared resources that all the threads trip
over, it may not pay to do as many of them in parallel. But for
homogeneous consumers, do them with a single queue, and do benchmarks to
determine the optimum number of consumers to start.
--
http://mail.python.org/mailman/listinfo/python-list