Re: [viff-devel] Two-threaded VIFF

2009-04-22 Thread Martin Geisler
Marcel Keller  writes:

> Hi friends of VIFF,
>
> I've finally completed the patch for a two-threaded VIFF where most of
> the VIFF code runs in a separate thread. The patch is against the tip
> of my repository: http://hg.viff.dk/mkeller

Okay. Your clone is about 70 commits and 6 weeks behind the current
version. There has been some changes in the ShareExchanger code since
which your code also touches. You should therefore pull in the latest
code and merge/rebase your patch.

But also, please look at splitting this up into smaller pieces. There
are a number of changes which do not logically belong together and
which should therefore be put into separate patches. Quoting from a
post on the Mercurial mailinglist:

  http://www.selenic.com/pipermail/mercurial-devel/2008-August/007632.html

  First, the probability of a patch being acceptable is the *product*
  of the probability of each individual feature being acceptable.
  Which means it's a good idea to submit each feature separately.

  Second, the probability of there being a bug in the patch is the sum
  of there being a bug in each feature. Which means you want each
  feature in a separate commit for locating regressions.

  Third, the difficulty of reviewing a patch increases more than
  linearly with the number of features included. Which means we really
  don't want to review patches with more than one feature.

In this case we're changing the basic design of how things are
processed in VIFF -- we're introducing threads and locks, which are
traditionally the source of many funny errors. Therefore we need some
good documentation for this:

* what is waiting on what?

* where can we deadlock -- how do we avoid this?

* what are the hidden assumptions (I saw that putting (None, None)
  into the deferred_queue signals that the thread should stop?

* maybe more :-)

> It turned out be not so straight-forward as I thought. I had to use
> a recursion, as in the hack, but I refined it to ensure that the
> recursion limit isn't exceeded.

Please document this somewhere, maybe in doc/runtime.txt or maybe a
new document which describes the design. There is a saying that if it
was hard to write the code, then it will be even harder to read it
afterwards. Therefore, if you put all your cleverness into writing the
code, you will not be able to understand it when reading it again :-)

> Benchmarks:
> Unfortunately, this solution is slower than the hack, e.g. one AES
> block encryption takes 4 seconds compared to 3 seconds with the hack.

The hack used a single thread but pumped messages out quicker by
directly calling the send-stuff function in the reactor, right? Then
it's no big surprice that it was faster since it had no locking to do.

> On the other hand, the preprocessing time in the actively secure
> multiplication is linear and not quadratic, whereas the online time is
> significantly larger:
>
>   two-threadedhackoriginal
> (n,t) online  preprocessing   online  preprocessing   online  preproc.
> (4,1) 6   22  4   17  4   20
> (7,2) 10  37  6   29  6   42
> (10,3)13  53  8   42  8   82
> (13,4)17  68  10  56  10  136
> (16,5)20  84  12  68  12  208
> (19,6)23  106 13  83  14  287
> (22,7)26  120 15  98  17  377

It is a shame that things become slower, but maybe that is the price
we have to pay for this more elaborate design.

> I did some profiling and didn't find an obvious reason why the
> two-thread is slower. Therefore, I guess that the reason is the
> multi-threading implementation of Python (which could be better, as
> mentioned in the discussion about the hack). The guess is also
> supported by the fact that having an own thread for every callback,
> which I also tried, turned out to be really slow.
>
> Unit tests:
> All unit test get passed, even the previosly skipped
> test_multiple_callbacks.

Excellent!

> This because I added the @increment_pc decorator to
> schedule_callback(). This of course changes the program counters
> heavily but I didn't experience any problems.
>
> Best regards,
> Marcel
> diff -r e2759515f57f apps/aes.py
> --- a/apps/aes.py Thu Mar 05 21:02:57 2009 +0100
> +++ b/apps/aes.py Tue Apr 21 17:25:45 2009 +0200
> @@ -82,7 +82,7 @@
>  rt.shutdown()
>  
>  g = gather_shares(opened_ciphertext)
> -g.addCallback(fin)
> +rt.schedule_callback(g, fin)
>  
>  def share_key(rt):
>  key =  []
> diff -r e2759515f57f apps/benchmark.py
> --- a/apps/benchmark.py   Thu Mar 05 21:02:57 2009 +0100
> +++ b/apps/benchmark.py   Tue Apr 21 17:25:45 2009 +0200
> @@ -91,6 +91,7 @@
>  print "Total time used: %.3f sec" % (stop-start)
>  print "Time per %s operation: %.0f ms" % (w

[viff-devel] Two-threaded VIFF

2009-04-21 Thread Marcel Keller

Hi friends of VIFF,

I've finally completed the patch for a two-threaded VIFF where most of 
the VIFF code runs in a separate thread. The patch is against the tip of 
my repository: http://hg.viff.dk/mkeller


It turned out be not so straight-forward as I thought. I had to use a 
recursion, as in the hack, but I refined it to ensure that the recursion 
limit isn't exceeded.


Benchmarks:
Unfortunately, this solution is slower than the hack, e.g. one AES block 
encryption takes 4 seconds compared to 3 seconds with the hack. On the 
other hand, the preprocessing time in the actively secure multiplication 
is linear and not quadratic, whereas the online time is significantly 
larger:


two-threadedhackoriginal
(n,t)   online  preprocessing   online  preprocessing   online  preproc.
(4,1)   6   22  4   17  4   20
(7,2)   10  37  6   29  6   42
(10,3)  13  53  8   42  8   82
(13,4)  17  68  10  56  10  136
(16,5)  20  84  12  68  12  208
(19,6)  23  106 13  83  14  287
(22,7)  26  120 15  98  17  377

I did some profiling and didn't find an obvious reason why the 
two-thread is slower. Therefore, I guess that the reason is the 
multi-threading implementation of Python (which could be better, as 
mentioned in the discussion about the hack). The guess is also supported 
by the fact that having an own thread for every callback, which I also 
tried, turned out to be really slow.


Unit tests:
All unit test get passed, even the previosly skipped 
test_multiple_callbacks. This because I added the @increment_pc 
decorator to schedule_callback(). This of course changes the program 
counters heavily but I didn't experience any problems.


Best regards,
Marcel
diff -r e2759515f57f apps/aes.py
--- a/apps/aes.py	Thu Mar 05 21:02:57 2009 +0100
+++ b/apps/aes.py	Tue Apr 21 17:25:45 2009 +0200
@@ -82,7 +82,7 @@
 rt.shutdown()
 
 g = gather_shares(opened_ciphertext)
-g.addCallback(fin)
+rt.schedule_callback(g, fin)
 
 def share_key(rt):
 key =  []
diff -r e2759515f57f apps/benchmark.py
--- a/apps/benchmark.py	Thu Mar 05 21:02:57 2009 +0100
+++ b/apps/benchmark.py	Tue Apr 21 17:25:45 2009 +0200
@@ -91,6 +91,7 @@
 print "Total time used: %.3f sec" % (stop-start)
 print "Time per %s operation: %.0f ms" % (what, 1000*(stop-start) / count)
 print "*" * 6
+sys.stdout.flush()
 
 
 operations = ["mul", "compToft05", "compToft07", "eq"]
@@ -174,7 +175,7 @@
 # run with no preprocessing. So they are quite brittle.
 if self.operation == operator.mul:
 key = ("generate_triples", (Zp,))
-desc = [(i, 1, 0) for i in range(2, 2 + count)]
+desc = [(2, 2, 2 * count + 1, 2, i, 1, 0) for i in range(1, count + 1)]
 program_desc.setdefault(key, []).extend(desc)
 elif isinstance(self.rt, ComparisonToft05Mixin):
 key = ("generate_triples", (GF256,))
@@ -228,7 +229,8 @@
 if seconds > 0:
 print "Starting test in %d" % seconds
 sys.stdout.flush()
-reactor.callLater(1, self.countdown, None, seconds - 1)
+time.sleep(1)
+self.countdown(None, seconds - 1)
 else:
 print "Starting test now"
 sys.stdout.flush()
@@ -255,6 +257,7 @@
 def run_test(self, _):
 c_shares = []
 record_start("parallel test")
+sys.stdout.flush()
 while self.a_shares and self.b_shares:
 a = self.a_shares.pop()
 b = self.b_shares.pop()
diff -r e2759515f57f apps/millionaires.py
--- a/apps/millionaires.py	Thu Mar 05 21:02:57 2009 +0100
+++ b/apps/millionaires.py	Tue Apr 21 17:25:45 2009 +0200
@@ -97,10 +97,10 @@
 # the argument (which is None since self.results_ready does
 # not return anything), so we throw it away using a lambda
 # expressions which ignores its first argument.
-results.addCallback(lambda _: runtime.synchronize())
+runtime.schedule_callback(results, lambda _: runtime.synchronize())
 # The next callback shuts the runtime down, killing the
 # connections between the players.
-results.addCallback(lambda _: runtime.shutdown())
+runtime.schedule_callback(results, lambda _: runtime.shutdown())
 
 def results_ready(self, results):
 # Since this method is called as a callback above, the results
diff -r e2759515f57f viff/active.py
--- a/viff/active.py	Thu Mar 05 21:02:57 2009 +0100
+++ b/viff/active.py	Tue Apr 21 17:25:45 2009 +0200
@@ -501,6 +501,7 @@
 result = Share(self, share_x.field)
 # This is the Deferred we will do processing on.
 triple = self.get_triple(share_x.field)
+