Re: [viff-devel] Mystery of the quadratic running time solved?
You're talking about this two-threaded solution as if it is something that exists and will solve all our problems... No, for now, it's just an imagination in my mind, a proposal for the next meeting, and a strong feeling that it's the right way to do it. But I still haven't seen it, and I would like to see how you can cleanly seperate the work of the two threads. I'm afraid that the threads will be alternating between working and waiting on the other thread in such a way that we could just as well use one thread. My idea is that the Twisted main loop runs in one thread and most of the VIFF code in the other. The Twisted thread only waits for I/O and the VIFF thread only waits if there is no work to be done. If the group decides to give this idea a try, I would be happy to implement it. I would claim that it can be done in one or two weeks. Also, threading in Python is unfortunately not optimal :-( The Python VM will use a single thread to execute the bytecode, even when using multiple thread in Python. It is only if the threads do things blocking for I/O that you will see a performance increase by using multiple threads. I'm aware that Python only runs on one CPU, a friend pointed that out to me today. However, the Twisted thread mentioned above would block on I/O. ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk
Re: [viff-devel] Multiparty AES in less than 3 seconds per block thanks to Twisted hack
Marcel Keller writes: >> [...] Maybe your short patch didn't provide enough information when >> taken out of context. > > That may be true. I just thought that I don't want to bother him with > VIFF code. Right, but I think we got a nice reply from him when I followed up with more details. >> Do you mean the inlineCallbacks or the coiterate? And why it not >> work? > > As far as I understood it, inlineCallbacks would stop the VIFF code, > which would make it not asynchronous anymore. Coiterate needs > generator functions. Maybe I'm wrong but I just consider the VIFF code > to be too complex to be put in generator functions. Please see the example posted by Jean-Paul Calderone for use with coiterate. It looks better than I had expected and I think it's nice to see what our alternatives are. But it is clear that our illusion that "x * y" is just like a normal multiplication begins to break down. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/. pgpaAFzZoE2d2.pgp Description: PGP signature ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk
Re: [viff-devel] Mystery of the quadratic running time solved?
Marcel Keller writes: Indeed we did not know (well I didn't) back then that the data was not sent immediately by Twisted, and I was starting to think yesterday whether the hack would make a difference. Lucky for us, it apparently does :) >>> That is not the only problem. To free the memory of the shares and to >>> send out further shares, also the incoming shares must be processed as >>> soon as possible. This is even trickier because incoming shares might >>> trigger code that calls functions sending out data, which activates >>> the Twisted reactor again and therefore leads to a possibly too deep >>> recursion. I think I have a solution for that, it just wasn't >>> necessary to implement it for now because the hack worked anyway. >> >> I guess we could simply not recurse if the recursion depth is too big? >> >> At some point we have to let the recursive calls finish in order to let >> the local variables and stack frames be garbage collected. > > Yes, but If we just stop recursing at a certain level, we might just > stay at that level for a longer time. That is also not optimal. In my > opinion, another point for the two-threaded solution. You're talking about this two-threaded solution as if it is something that exists and will solve all our problems... But I still haven't seen it, and I would like to see how you can cleanly seperate the work of the two threads. I'm afraid that the threads will be alternating between working and waiting on the other thread in such a way that we could just as well use one thread. Also, threading in Python is unfortunately not optimal :-( The Python VM will use a single thread to execute the bytecode, even when using multiple thread in Python. It is only if the threads do things blocking for I/O that you will see a performance increase by using multiple threads. Please see these links for some discussion on this so-called Global Interpreter Lock (GIL): http://www.artima.com/weblogs/viewpost.jsp?thread=214235 http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/. pgpsk7RDTH9ae.pgp Description: PGP signature ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk
Re: [viff-devel] Multiparty AES in less than 3 seconds per block thanks to Twisted hack
For those who are not on the Twisted mailing list, the reply is here: http://twistedmatrix.com/pipermail/twisted-python/2009-February/019252.html There Jean-Paul Calderone says that he doesn't believe in a re-entrent reactor, but he does not explain in detail why that it so. I think there are good reasons not to believe in a re-entrant reactor. My guess is that a generally re-entrent reactor could end up doing a lot of recursion where each recursion step holds unto a local scope and thereby keeps local variables from being reclaimed by the garbage collector. A simple loop does not have that problem, so I can understand why the Twisted guys will want to keep the design as simple as possible. Me too, I think I mentioned at some time that I didn't expect them to accept my hack. I'll try and write a mail to them to explain our problem in more detail. Maybe your short patch didn't provide enough information when taken out of context. That may be true. I just thought that I don't want to bother him with VIFF code. - It breaks some unit tests. I'm not sure whether it really breaks functionality or just the unit testing tool of Twisted. Are these VIFF (trial viff) or Twisted (trial twisted) unit tests? In any case, we have to fix this if there in order to keep our sanity :-) The VIFF unit tests. Though I didn't execute the Twisted unit tests, I think that they would success since there no changes to Twisted if the loop call is not used. As I said at the meeting, a possibility would be to go multi-threaded. The Twisted maintainer suggested another way but I don't think that that way works for us. Do you mean the inlineCallbacks or the coiterate? And why it not work? As far as I understood it, inlineCallbacks would stop the VIFF code, which would make it not asynchronous anymore. Coiterate needs generator functions. Maybe I'm wrong but I just consider the VIFF code to be too complex to be put in generator functions. ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk
Re: [viff-devel] Mystery of the quadratic running time solved?
Indeed we did not know (well I didn't) back then that the data was not sent immediately by Twisted, and I was starting to think yesterday whether the hack would make a difference. Lucky for us, it apparently does :) That is not the only problem. To free the memory of the shares and to send out further shares, also the incoming shares must be processed as soon as possible. This is even trickier because incoming shares might trigger code that calls functions sending out data, which activates the Twisted reactor again and therefore leads to a possibly too deep recursion. I think I have a solution for that, it just wasn't necessary to implement it for now because the hack worked anyway. I guess we could simply not recurse if the recursion depth is too big? At some point we have to let the recursive calls finish in order to let the local variables and stack frames be garbage collected. Yes, but If we just stop recursing at a certain level, we might just stay at that level for a longer time. That is also not optimal. In my opinion, another point for the two-threaded solution. ___ viff-devel mailing list (http://viff.dk/) viff-devel@viff.dk http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk
Re: [viff-devel] Mystery of the quadratic running time solved?
Marcel Keller writes: >> I think we would get the same result if we started a LoopingCall that >> executes process_deferred_queue with an interval of, say, 100 ms: >> >> >> http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.task.LoopingCall.html >> >> This should work since the runUntilCurrent method runs through the >> waiting calls and will trigger our process_deferred_queue method. >> >> And voilá -- no hacking of the Twisted source needed. > > I'm not sure but LoopingCall._reschedule() looks more like it > schedules the calls at certain tick, not as soon as possible after the > interval is elapsed. This might not cost too much time but still > doesn't feel very elegant. Furthermore, setting the interval very low > leads to high CPU usage when waiting. Again, this is not too bad but > not elegant either. The same applies if using reactor.callLater() > directly. A looping call is just a higher level wraper for doing def reschedule(func): func() reactor.callLater(interval, reschedule, func) reschedule(func) It will execute the function when the (now + interval) time has been reached and when the control flow returns to the reactor's event loop. We probably wont need the extra logic in a looping call, so we can just let the function reschedule itself like above. If we do this with an interval of 0, then the function will be called on each iteration through the reactor's event loop -- just like your loopCall I believe? > Of course, we can avoid hacking the Twisted code by extending it > within VIFF. Still, I'm in favor of the two-threaded solution because > it's more elegant, doesn't have the danger of recursing too deep, and, > in my opinion, it should be feasible. I have not yet seen the two-threaded approach -- do you have a patch? >>> diff -r e2759515f57f viff/runtime.py >>> --- a/viff/runtime.py Thu Mar 05 21:02:57 2009 +0100 >>> +++ b/viff/runtime.py Fri Mar 06 13:43:14 2009 +0100 >>> @@ -306,6 +306,8 @@ >>> deferred = deq.popleft() >>> if not deq: >>> del self.incoming_data[key] >>> +# just queue >>> +self.factory.runtime.queue_deferred(deferred) >> >> Why is this done? > > At this time, we shouldn't call the callbacks because we might recurse > into selectreactor.doSelect(). However, we want to know which > deferreds are ready so we can call deferred.callback() later. Uhh, this sounds a bit dangerous -- do we know exactly which Deferreds we can invoke callback on and which we cannot? As I remember, we invoke callback in a couple of other locations, is that safe? >> If that doesn't matter, then I think this would be faster: >> >> queue, self.deferred_queue = self.deferred_queue, [] >> map(Deferred.unpause, queue) >> >> My idea is that looping over the list with map is faster than >> repeatedly popping items from the beginning (an O(n) operation). > > But map() still would need O(n) time because that is the nature of > calling a function n times, isn't it? Maybe the function calls are > optimized but the code in the function still is called n times. Each pop(0) call is an O(n) operation, so we get O(n^2) here -- it is an expensive way to loop through a list. And now that I look at it, using map will still unpause the Deferreds in the order as you added them. The difference is then that anything added to the queue as a result of the unpause calls will be processed the next time the code is called. >> A question springs to my mind: calling >> >> reactor.runUntilCurrent() >> reactor.doIteration(0) >> >> is the same as calling >> >> reactor.iterate(0) >> >> and the documentation for that method says: >> >> [...] This method is not re-entrant: you must not call it recursively; >> in particular, you must not call it while the reactor is running. >> >> How does your code ensure that we only call myIteration when we're >> not in a call made by the reactor? And could we simply call >> reactor.iterate instead? > > We actually call it recursively but it should be reentrant if it's not > called from doIteration(). doIteration() is a the same as > select.doSelect(), which certainly is not reentrant. We however call > it from the loop call (process_deferred_queue()) after doIterate(). > > Calling reactor.iterate() is not enough because it doesn't call > process_deferred_queue(). So if we call reactor.iterate and then runtime.process_deferred_queue as reactor.callLater(0, reactor.iterate) reactor.callLater(0, runtime.process_deferred_queue) we should be fine? They would then both run one after another just as soon as the event loop is reached. My goal is to violate as few Twisted docstrings as possible :-) And to use the tools provided by Twisted as much as possible. I would also like to hear what the Twisted guys have to say about calling reactor.iterate like this, it would be nice if they could say that it *is* infact safe to do it like this. > The prin
Re: [viff-devel] Mystery of the quadratic running time solved?
Wow, this is nice! I had sort of given up finding the cause of this :-( Thank you for looking at this, and just in time for my presentation at PKC in 10 days :-) You're welcome. :-) --- /usr/lib/python2.5/site-packages/twisted/internet/base.py 2008-07-29 22:13:54.0 +0200 +++ internet/base.py2009-02-20 12:27:42.0 +0100 @@ -1127,17 +1127,32 @@ self.startRunning(installSignalHandlers=installSignalHandlers) self.mainLoop() + +def setLoopCall(self, f, *args, **kw): +self.loopCall = (f, args, kw) + + +def myIteration(self, t): +# Advance simulation time in delayed event +# processors. +self.runUntilCurrent() + +if (t is None): +t2 = self.timeout() +t = self.running and t2 + +self.doIteration(t) + +if ("loopCall" in dir(self)): +f, args, kw = self.loopCall +f(*args, **kw) + def mainLoop(self): while self._started: try: while self._started: -# Advance simulation time in delayed event -# processors. -self.runUntilCurrent() -t2 = self.timeout() -t = self.running and t2 -self.doIteration(t) +self.myIteration(None) except: log.msg("Unexpected error in main loop.") log.err() The changes above basically insert a call to self.loopCall after each doIteration invocation, right? When the loopCall is process_deferred_queue the main loop becomes: self.runUntilCurrent() self.doIteration(t) runtime.process_deferred_queue self.runUntilCurrent() self.doIteration(t) runtime.process_deferred_queue ... Yes, exactly. I think we would get the same result if we started a LoopingCall that executes process_deferred_queue with an interval of, say, 100 ms: http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.task.LoopingCall.html This should work since the runUntilCurrent method runs through the waiting calls and will trigger our process_deferred_queue method. And voilá -- no hacking of the Twisted source needed. I'm not sure but LoopingCall._reschedule() looks more like it schedules the calls at certain tick, not as soon as possible after the interval is elapsed. This might not cost too much time but still doesn't feel very elegant. Furthermore, setting the interval very low leads to high CPU usage when waiting. Again, this is not too bad but not elegant either. The same applies if using reactor.callLater() directly. Of course, we can avoid hacking the Twisted code by extending it within VIFF. Still, I'm in favor of the two-threaded solution because it's more elegant, doesn't have the danger of recursing too deep, and, in my opinion, it should be feasible. diff -r e2759515f57f viff/runtime.py --- a/viff/runtime.py Thu Mar 05 21:02:57 2009 +0100 +++ b/viff/runtime.py Fri Mar 06 13:43:14 2009 +0100 @@ -306,6 +306,8 @@ deferred = deq.popleft() if not deq: del self.incoming_data[key] +# just queue +self.factory.runtime.queue_deferred(deferred) Why is this done? At this time, we shouldn't call the callbacks because we might recurse into selectreactor.doSelect(). However, we want to know which deferreds are ready so we can call deferred.callback() later. +#: Activation depth counter +self.depth_counter = 0 This is for keeping track of the recursion depth in the future? Yes. It was used in some debug output earlier but I removed it to simplify the patch. +def queue_deferred(self, deferred): +deferred.pause() +self.deferred_queue.append(deferred) + +def process_deferred_queue(self): +while(self.deferred_queue): +deferred = self.deferred_queue.pop(0) +deferred.unpause() Are you doing it this way to ensure that the Deferreds are unpaused in the same order as they were added to the list? Yes. I'm not sure whether this is really necessary but it seems just cleaner because the callback of the some deferred might do a lot of computations and recurse, which unnecessarily extends the lifetime of the remaining deferreds. If that doesn't matter, then I think this would be faster: queue, self.deferred_queue = self.deferred_queue, [] map(Deferred.unpause, queue) My idea is that looping over the list with map is faster than repeatedly popping items from the beginning (an O(n) operation). But map() still would need O(n) time because that is the nature of calling a function n times, isn't it? Maybe the function calls are optimized but the code in the function still is called n times. +def activate_reactor(self): +self.activation_counter += 1 + +# setting the number to n makes the reactor called