Martin Geisler m...@daimi.au.dk writes:
I'll try and write a mail to them to explain our problem in more
detail. Maybe your short patch didn't provide enough information when
taken out of context.
I've included a larger answer from Jean-Paul Calderone below -- you can
read in context here:
http://thread.gmane.org/gmane.comp.python.twisted/18029/focus=18089
Jean-Paul Calderone exar...@divmod.com writes:
On Sat, 07 Mar 2009 19:38:46 +0100, Martin Geisler m...@daimi.au.dk wrote:
Jean-Paul Calderone exar...@divmod.com writes:
Hi,
Thanks for the answer. I'm also with the VIFF project and I would
like to explain a bit more about the background for the hack by
Marcel.
[snip]
So you're doing a ton of work all at once now and you want to split
up that ton of work into smaller pieces and do it a little at a
time?
Sort of. We have overloaded the arithmetic operators in our library,
so people will expect to be able to write
# xs and ys are big lists of our objects
dot_product
for (x, y) in zip(xs, ys):
dot_product += x * y
Here the multiplications involves network traffic and return
Deferreds. We would like the network traffic for the first
multiplication to begin immediately, *before* the remaining
multiplications are done.
Doing all the multiplications up front makes the code block the
reactor and uses an awful lot of RAM. If we let each multiplication
trigger the sending of its data immediately, and if we process
incoming messages along the way, memory can be reclaimed for the
earlier multiplications and the above calculation should run in
constant memory.
Hm. I would bet the constant would be pretty high, though. Things
will only balance out once the network is keeping up with the local
for loop. Actually, I'm not sure why it would be constant at all.
Won't the local for loop always run much faster than any network
operations can happen? In this case, memory usage will go towards
K(local) * set size - K(remote) * set size, or just O(set size); that
is to say, linear.
Sending and processing data in a more even flow makes our benchmark
results better and more consistent from one run to the next.
It seems like what you'll really benefit from most is pipelining with
a maximum pipeline depth that's not too big (so as to conserve memory)
but not too small (so as to avoid wasting time on network round trip
time).
If that's the case, then you don't need to modify the reactor, you
just need to split up the work your code is going. There are a lot
of techniques for doing this. coiterate and inlineCallbacks are two
solutions which are closest to cookie cutter (ie, you have the
least flexibility in deciding how to use them).
Right -- we might be able to use these techniques. I haven't looked
at coiterate yet. With inlineCallbacks I guess the code would look
something like this:
# xs and ys are big lists of our objects
dot_product
for (x, y) in zip(xs, ys):
dot_product += (yield x * y)
which is not so bad, expect that it destroys the nice illusion that x
and y behave like normal integers even though the multiplication
involves network traffic.
Perhaps with coiterate this might look something like...
def op(xs, ys):
dot_product = 0
for (x, y) in zip(xs, ys):
dot_product += x * y
yield
yield dot_product
d = coiterate(op(xs, ys))
This is flawed, but maybe it can be fixed. What's good about it is
that it doesn't try to drive the reactor re-entrantly. :) It also
avoids the yield in the += and * operations, which somewhat preserves
your illusion of normalcy (I'm not saying I like that illusion, but
I'll leave that aside for now).
What's bad about it is that it still lets the local loop run
arbitrarily far ahead of the results from the network, giving you
unbounded memory usage. To fix that, it should yield a Deferred every
once in a while. The reason I leave it flawed here is that it's a
little tricky to figure out which Deferred to yield. What would be
great would be to yield the Deferred for an operation which preceeds
the most recently executed operation by some number of operations.
The number of operations by which it preceeds the most recent will be
your pipeline depth (roughly). The effect this has on coiterate is to
cause your local loop to cease to execute until that Deferred has
fired. By making it a Deferred for an operation some time /before/
the most recent operation, you keep the network busy and avoid wasting
time on round trip times. Once the Deferred fires, your loop gets run
a few more times which has the effect of putting more work into the
pipeline - until you've got enough extra work to keep things from
stalling again, and then coiterate puts you to sleep for a while
again.
You have a very long, steep, uphill battle to convince me that
adding support for re-entrant iteration is a good idea.
One problem I can think of is