Re: [viff-devel] Mystery of the quadratic running time solved?

2009-03-08 Thread Marcel Keller

You're talking about this two-threaded solution as if it is something
that exists and will solve all our problems...


No, for now, it's just an imagination in my mind, a proposal for the 
next meeting, and a strong feeling that it's the right way to do it.



But I still haven't seen it, and I would like to see how you can cleanly
seperate the work of the two threads. I'm afraid that the threads will
be alternating between working and waiting on the other thread in such a
way that we could just as well use one thread.


My idea is that the Twisted main loop runs in one thread and most of the 
VIFF code in the other. The Twisted thread only waits for I/O and the 
VIFF thread only waits if there is no work to be done.


If the group decides to give this idea a try, I would be happy to 
implement it. I would claim that it can be done in one or two weeks.



Also, threading in Python is unfortunately not optimal :-( The Python VM
will use a single thread to execute the bytecode, even when using
multiple thread in Python. It is only if the threads do things blocking
for I/O that you will see a performance increase by using multiple
threads.


I'm aware that Python only runs on one CPU, a friend pointed that out to 
me today. However, the Twisted thread mentioned above would block on I/O.


___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Multiparty AES in less than 3 seconds per block thanks to Twisted hack

2009-03-08 Thread Martin Geisler
Marcel Keller  writes:

>> [...] Maybe your short patch didn't provide enough information when
>> taken out of context.
>
> That may be true. I just thought that I don't want to bother him with
> VIFF code.

Right, but I think we got a nice reply from him when I followed up with
more details.

>> Do you mean the inlineCallbacks or the coiterate? And why it not
>> work?
>
> As far as I understood it, inlineCallbacks would stop the VIFF code,
> which would make it not asynchronous anymore. Coiterate needs
> generator functions. Maybe I'm wrong but I just consider the VIFF code
> to be too complex to be put in generator functions.

Please see the example posted by Jean-Paul Calderone for use with
coiterate. It looks better than I had expected and I think it's nice to
see what our alternatives are.

But it is clear that our illusion that "x * y" is just like a normal
multiplication begins to break down.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.


pgpaAFzZoE2d2.pgp
Description: PGP signature
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Mystery of the quadratic running time solved?

2009-03-08 Thread Martin Geisler
Marcel Keller  writes:

 Indeed we did not know (well I didn't) back then that the data was
 not sent immediately by Twisted, and I was starting to think
 yesterday whether the hack would make a difference. Lucky for us, it
 apparently does :)
>>> That is not the only problem. To free the memory of the shares and to
>>> send out further shares, also the incoming shares must be processed as
>>> soon as possible. This is even trickier because incoming shares might
>>> trigger code that calls functions sending out data, which activates
>>> the Twisted reactor again and therefore leads to a possibly too deep
>>> recursion. I think I have a solution for that, it just wasn't
>>> necessary to implement it for now because the hack worked anyway.
>>
>> I guess we could simply not recurse if the recursion depth is too big?
>>
>> At some point we have to let the recursive calls finish in order to let
>> the local variables and stack frames be garbage collected.
>
> Yes, but If we just stop recursing at a certain level, we might just
> stay at that level for a longer time. That is also not optimal. In my
> opinion, another point for the two-threaded solution.

You're talking about this two-threaded solution as if it is something
that exists and will solve all our problems...

But I still haven't seen it, and I would like to see how you can cleanly
seperate the work of the two threads. I'm afraid that the threads will
be alternating between working and waiting on the other thread in such a
way that we could just as well use one thread.

Also, threading in Python is unfortunately not optimal :-( The Python VM
will use a single thread to execute the bytecode, even when using
multiple thread in Python. It is only if the threads do things blocking
for I/O that you will see a performance increase by using multiple
threads.

Please see these links for some discussion on this so-called Global
Interpreter Lock (GIL):

  http://www.artima.com/weblogs/viewpost.jsp?thread=214235
  
http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.


pgpsk7RDTH9ae.pgp
Description: PGP signature
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Multiparty AES in less than 3 seconds per block thanks to Twisted hack

2009-03-08 Thread Marcel Keller

For those who are not on the Twisted mailing list, the reply is here:

  http://twistedmatrix.com/pipermail/twisted-python/2009-February/019252.html

There Jean-Paul Calderone says that he doesn't believe in a re-entrent
reactor, but he does not explain in detail why that it so.


I think there are good reasons not to believe in a re-entrant reactor.


My guess is that a generally re-entrent reactor could end up doing a lot
of recursion where each recursion step holds unto a local scope and
thereby keeps local variables from being reclaimed by the garbage
collector. A simple loop does not have that problem, so I can understand
why the Twisted guys will want to keep the design as simple as possible.


Me too, I think I mentioned at some time that I didn't expect them to 
accept my hack.



I'll try and write a mail to them to explain our problem in more detail.
Maybe your short patch didn't provide enough information when taken out
of context.


That may be true. I just thought that I don't want to bother him with 
VIFF code.



- It breaks some unit tests. I'm not sure whether it really breaks
functionality or just the unit testing tool of Twisted.


Are these VIFF (trial viff) or Twisted (trial twisted) unit tests? In
any case, we have to fix this if there in order to keep our sanity :-)


The VIFF unit tests. Though I didn't execute the Twisted unit tests, I 
think that they would success since there no changes to Twisted if the 
loop call is not used.



As I said at the meeting, a possibility would be to go multi-threaded.
The Twisted maintainer suggested another way but I don't think that
that way works for us.


Do you mean the inlineCallbacks or the coiterate? And why it not work?


As far as I understood it, inlineCallbacks would stop the VIFF code, 
which would make it not asynchronous anymore. Coiterate needs generator 
functions. Maybe I'm wrong but I just consider the VIFF code to be too 
complex to be put in generator functions.

___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Mystery of the quadratic running time solved?

2009-03-08 Thread Marcel Keller

Indeed we did not know (well I didn't) back then that the data was
not sent immediately by Twisted, and I was starting to think
yesterday whether the hack would make a difference. Lucky for us, it
apparently does :)

That is not the only problem. To free the memory of the shares and to
send out further shares, also the incoming shares must be processed as
soon as possible. This is even trickier because incoming shares might
trigger code that calls functions sending out data, which activates
the Twisted reactor again and therefore leads to a possibly too deep
recursion. I think I have a solution for that, it just wasn't
necessary to implement it for now because the hack worked anyway.


I guess we could simply not recurse if the recursion depth is too big?

At some point we have to let the recursive calls finish in order to let
the local variables and stack frames be garbage collected.


Yes, but If we just stop recursing at a certain level, we might just 
stay at that level for a longer time. That is also not optimal. In my 
opinion, another point for the two-threaded solution.

___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Mystery of the quadratic running time solved?

2009-03-08 Thread Martin Geisler
Marcel Keller  writes:

>> I think we would get the same result if we started a LoopingCall that
>> executes process_deferred_queue with an interval of, say, 100 ms:
>>
>>   
>> http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.task.LoopingCall.html
>>
>> This should work since the runUntilCurrent method runs through the
>> waiting calls and will trigger our process_deferred_queue method.
>>
>> And voilá -- no hacking of the Twisted source needed.
>
> I'm not sure but LoopingCall._reschedule() looks more like it
> schedules the calls at certain tick, not as soon as possible after the
> interval is elapsed. This might not cost too much time but still
> doesn't feel very elegant. Furthermore, setting the interval very low
> leads to high CPU usage when waiting. Again, this is not too bad but
> not elegant either. The same applies if using reactor.callLater()
> directly.

A looping call is just a higher level wraper for doing

  def reschedule(func):
func()
reactor.callLater(interval, reschedule, func)

  reschedule(func)

It will execute the function when the (now + interval) time has been
reached and when the control flow returns to the reactor's event loop.
We probably wont need the extra logic in a looping call, so we can just
let the function reschedule itself like above.

If we do this with an interval of 0, then the function will be called on
each iteration through the reactor's event loop -- just like your
loopCall I believe?

> Of course, we can avoid hacking the Twisted code by extending it
> within VIFF. Still, I'm in favor of the two-threaded solution because
> it's more elegant, doesn't have the danger of recursing too deep, and,
> in my opinion, it should be feasible.

I have not yet seen the two-threaded approach -- do you have a patch?

>>> diff -r e2759515f57f viff/runtime.py
>>> --- a/viff/runtime.py   Thu Mar 05 21:02:57 2009 +0100
>>> +++ b/viff/runtime.py   Fri Mar 06 13:43:14 2009 +0100
>>> @@ -306,6 +306,8 @@
>>>  deferred = deq.popleft()
>>>  if not deq:
>>>  del self.incoming_data[key]
>>> +# just queue
>>> +self.factory.runtime.queue_deferred(deferred)
>>
>> Why is this done?
>
> At this time, we shouldn't call the callbacks because we might recurse
> into selectreactor.doSelect(). However, we want to know which
> deferreds are ready so we can call deferred.callback() later.

Uhh, this sounds a bit dangerous -- do we know exactly which Deferreds
we can invoke callback on and which we cannot? As I remember, we invoke
callback in a couple of other locations, is that safe?

>> If that doesn't matter, then I think this would be faster:
>>
>>   queue, self.deferred_queue = self.deferred_queue, []
>>   map(Deferred.unpause, queue)
>>
>> My idea is that looping over the list with map is faster than
>> repeatedly popping items from the beginning (an O(n) operation).
>
> But map() still would need O(n) time because that is the nature of
> calling a function n times, isn't it? Maybe the function calls are
> optimized but the code in the function still is called n times.

Each pop(0) call is an O(n) operation, so we get O(n^2) here -- it is an
expensive way to loop through a list. And now that I look at it, using
map will still unpause the Deferreds in the order as you added them.

The difference is then that anything added to the queue as a result of
the unpause calls will be processed the next time the code is called.

>> A question springs to my mind: calling
>>
>>   reactor.runUntilCurrent()
>>   reactor.doIteration(0)
>>
>> is the same as calling
>>
>>   reactor.iterate(0)
>>
>> and the documentation for that method says:
>>
>>   [...] This method is not re-entrant: you must not call it recursively;
>>   in particular, you must not call it while the reactor is running.
>>
>> How does your code ensure that we only call myIteration when we're
>> not in a call made by the reactor? And could we simply call
>> reactor.iterate instead?
>
> We actually call it recursively but it should be reentrant if it's not
> called from doIteration(). doIteration() is a the same as
> select.doSelect(), which certainly is not reentrant. We however call
> it from the loop call (process_deferred_queue()) after doIterate().
>
> Calling reactor.iterate() is not enough because it doesn't call
> process_deferred_queue().

So if we call reactor.iterate and then runtime.process_deferred_queue as

  reactor.callLater(0, reactor.iterate)
  reactor.callLater(0, runtime.process_deferred_queue)

we should be fine? They would then both run one after another just as
soon as the event loop is reached.

My goal is to violate as few Twisted docstrings as possible :-) And to
use the tools provided by Twisted as much as possible.

I would also like to hear what the Twisted guys have to say about
calling reactor.iterate like this, it would be nice if they could say
that it *is* infact safe to do it like this.

> The prin

Re: [viff-devel] Mystery of the quadratic running time solved?

2009-03-08 Thread Marcel Keller

Wow, this is nice! I had sort of given up finding the cause of this :-(
Thank you for looking at this, and just in time for my presentation at
PKC in 10 days :-)


You're welcome. :-)


--- /usr/lib/python2.5/site-packages/twisted/internet/base.py   2008-07-29 
22:13:54.0 +0200
+++ internet/base.py2009-02-20 12:27:42.0 +0100
@@ -1127,17 +1127,32 @@
 self.startRunning(installSignalHandlers=installSignalHandlers)
 self.mainLoop()
 
+
+def setLoopCall(self, f, *args, **kw):

+self.loopCall = (f, args, kw)
+
+
+def myIteration(self, t):
+# Advance simulation time in delayed event
+# processors.
+self.runUntilCurrent()
+
+if (t is None):
+t2 = self.timeout()
+t = self.running and t2
+
+self.doIteration(t)
+
+if ("loopCall" in dir(self)):
+f, args, kw = self.loopCall
+f(*args, **kw)
+
 
 def mainLoop(self):

 while self._started:
 try:
 while self._started:
-# Advance simulation time in delayed event
-# processors.
-self.runUntilCurrent()
-t2 = self.timeout()
-t = self.running and t2
-self.doIteration(t)
+self.myIteration(None)
 except:
 log.msg("Unexpected error in main loop.")
 log.err()


The changes above basically insert a call to self.loopCall after each
doIteration invocation, right?

When the loopCall is process_deferred_queue the main loop becomes:

  self.runUntilCurrent()
  self.doIteration(t)
  runtime.process_deferred_queue

  self.runUntilCurrent()
  self.doIteration(t)
  runtime.process_deferred_queue

  ...


Yes, exactly.


I think we would get the same result if we started a LoopingCall that
executes process_deferred_queue with an interval of, say, 100 ms:

  
http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.task.LoopingCall.html

This should work since the runUntilCurrent method runs through the
waiting calls and will trigger our process_deferred_queue method.

And voilá -- no hacking of the Twisted source needed.


I'm not sure but LoopingCall._reschedule() looks more like it schedules 
the calls at certain tick, not as soon as possible after the interval is 
elapsed. This might not cost too much time but still doesn't feel very 
elegant. Furthermore, setting the interval very low leads to high CPU 
usage when waiting. Again, this is not too bad but not elegant either. 
The same applies if using reactor.callLater() directly.


Of course, we can avoid hacking the Twisted code by extending it within 
VIFF. Still, I'm in favor of the two-threaded solution because it's more 
elegant, doesn't have the danger of recursing too deep, and, in my 
opinion, it should be feasible.



diff -r e2759515f57f viff/runtime.py
--- a/viff/runtime.py   Thu Mar 05 21:02:57 2009 +0100
+++ b/viff/runtime.py   Fri Mar 06 13:43:14 2009 +0100
@@ -306,6 +306,8 @@
 deferred = deq.popleft()
 if not deq:
 del self.incoming_data[key]
+# just queue
+self.factory.runtime.queue_deferred(deferred)


Why is this done?


At this time, we shouldn't call the callbacks because we might recurse 
into selectreactor.doSelect(). However, we want to know which deferreds 
are ready so we can call deferred.callback() later.



+#: Activation depth counter
+self.depth_counter = 0


This is for keeping track of the recursion depth in the future?


Yes. It was used in some debug output earlier but I removed it to 
simplify the patch.



+def queue_deferred(self, deferred):
+deferred.pause()
+self.deferred_queue.append(deferred)
+
+def process_deferred_queue(self):
+while(self.deferred_queue):
+deferred = self.deferred_queue.pop(0)
+deferred.unpause()


Are you doing it this way to ensure that the Deferreds are unpaused in
the same order as they were added to the list?


Yes. I'm not sure whether this is really necessary but it seems just 
cleaner because the callback of the some deferred might do a lot of 
computations and recurse, which unnecessarily extends the lifetime of 
the remaining deferreds.



If that doesn't matter, then I think this would be faster:

  queue, self.deferred_queue = self.deferred_queue, []
  map(Deferred.unpause, queue)

My idea is that looping over the list with map is faster than repeatedly
popping items from the beginning (an O(n) operation).


But map() still would need O(n) time because that is the nature of 
calling a function n times, isn't it? Maybe the function calls are 
optimized but the code in the function still is called n times.



+def activate_reactor(self):
+self.activation_counter += 1
+
+# setting the number to n makes the reactor called