Re: [viff-devel] What to benchmark

2008-09-24 Thread Martin Geisler
[EMAIL PROTECTED] writes:

> Quoting Mikkel Krøigård <[EMAIL PROTECTED]>:
>
>> Citat Martin Geisler <[EMAIL PROTECTED]>:
>>
>> > I've looked at the GMPY code, and it is a fairly straightforward
>> > wrapper for the GMP library, as you describe.
>> >
>> > But I don't know if it makes it easier for us to benchmark just
>> > because it is split into its own C code...
>>
>> I never said it would. If you use this approach, it is easy to see
>> how much is spent on the dangerous arithmetic, but I guess a
>> profiler could tell you how much time Python spends on the
>> functions implementing the operators anyway.
>
> If that's the case, then it doesn't make sense w.r.t. the profiling
> to use GMPY. I was assuming the profiler could not give you
> information that was so fine-grained.

Some time ago I build in support for the normal Python profiler in all
VIFF programs, so if you run the benchmark with --profile you get
results like this for 4000 multiplications:

 889364 function calls (762586 primitive calls) in 17.830 CPU seconds

   Ordered by: internal time, call count
   List reduced from 192 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 447.3500.167   12.1770.277 selectreactor.py:82(doSelect)

The line above says that the internal time spend on 44 calls to
doSelect was 7.35 seconds. The cumtime includes time spend in
functions called from doSelect, most importantly the actual select
call done there. I interpret this to mean that the program slept for 5
seconds (12 - 7 = 5) out of the 18 seconds it ran.

First one should note that this sleeping includes the 3 second
countdown done by the benchmark and the time it takes for the programs
connect to each other initially (the data is for Player 3/3 and this
player waits for Player 1 and 2).

Second, the profiling slows things down -- I'm not sure exactly how
much, but maybe with a factor or 2. So it is the relative numbers
which are important.


70684/413422.3300.0008.6690.000 defer.py:306(_runCallbacks)
   0.6480.0001.7680.000 shamir.py:30(share)

This player did a total of  Shamir sharings: it was responsible
for 8000/3 = 2666 of the initial 8000 Shamir sharings, and then it did
4000 resharings as part of the secure multiplications.

  360070.6010.0000.7650.000 field.py:371(__mul__)

This is the overloaded multiplication operator for field elements, the
one in pure Python code.

85360/413600.5430.0005.0570.000 defer.py:168(addCallbacks)
 1040210.4270.0000.4270.000 field.py:339(__init__)

Total number of field elements created: 104021. This includes very
short-lived objects created from an expression like "x + y + z" where
"x + y" create a temporary field element which is then added to z.

  133380.3640.0000.7980.000 runtime.py:627(_expect_share)
  319980.3580.0000.4830.000 field.py:342(__add__)

Additions in the field. The Shamir sharing and recombination does
addition.

   80040.3190.0001.6150.000 runtime.py:203(__init__)
   40000.2750.0000.7080.000 shamir.py:95(recombine)

This is the time it takes to recombine 4000 Shamir shares.

  133400.2630.0004.4630.000 runtime.py:286(stringReceived)
   80000.2510.0001.5380.000 runtime.py:1041(shamir_share)

The time it took to do the initial Shamir sharing of 8000 numbers.

 480.2490.0054.7120.098 basic.py:345(dataReceived)
28015/213420.2340.0004.2120.000 defer.py:283(_startRunCallbacks)
49350/213500.2260.0004.6130.000 defer.py:185(addCallback)
28015/213420.2160.0004.3810.000 defer.py:229(callback)
20002/120020.2120.0007.1260.001 runtime.py:372(inc_pc_wrapper)

Program counter book-keeping.

  280100.1960.0000.4030.000 runtime.py:80(__init__)
  133380.1900.0000.3470.000 abstract.py:164(write)
32006/280040.1710.0001.4200.000 runtime.py:227(_callback_fired)
  133380.1700.0000.1700.000 runtime.py:593(_expect_data)
  120060.1590.0001.2380.000 runtime.py:611(_exchange_shares)
   40000.1550.0002.6740.001 runtime.py:1025(_shamir_share)
   93320.1410.0000.2700.000 random.py:148(randrange)
   93320.1290.0000.1290.000 random.py:218(_randbelow)
  133380.1200.0000.4670.000 basic.py:357(sendString)
  133360.1180.0000.5850.000 runtime.py:325(sendData)
   40000.1120.0005.4600.001 runtime.py:791(mul)
   80000.1090.0004.3920.001 
runtime.py:553(schedule_callback)
  133380.0780.0000.1240.000 runtime.py:629()
  133430.0780.0000.1570.000 tcp.py:260(startWriting)
  10.0

Re: [viff-devel] What to benchmark

2008-09-24 Thread ivan
Quoting Mikkel Krøigård <[EMAIL PROTECTED]>:

> Citat Martin Geisler <[EMAIL PROTECTED]>:
>
> > I've looked at the GMPY code, and it is a fairly straightforward
> > wrapper for the GMP library, as you describe.
> >
> > But I don't know if it makes it easier for us to benchmark just
> > because it is split into its own C code...
> I never said it would. If you use this approach, it is easy to see how much
> is
> spent on the dangerous arithmetic, but I guess a profiler could tell you how
> much time Python spends on the functions implementing the operators anyway.

If that's the case, then it doesn't make sense w.r.t. the profiling to
use GMPY. I was assuming the profiler could not give you information that was so
fine-grained.

But at least it is good news that Sigurd saw a speed-up from using C, albeit on
large numbers. It indicates that the raw computing time is not completely
dwarfed by bookkeeping etc.

>
> It is not completely unimaginable, however, that someone would want to know
> how
> much actually goes on inside gmpy (arithmetic on big numbers, the data) and
> how
> much goes on outside (counting variables, various kinds of overhead).

That someone is me. I think it is important to know what fraction of the time we
spend on computing we HAVE to do.

regards, Ivan
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Some profiling results

2008-09-24 Thread ivan
Quoting Martin Geisler <[EMAIL PROTECTED]>:

> Martin Geisler <[EMAIL PROTECTED]> writes:
>
> > Hi everybody,
> >
> > I have done some testing and come up with some strange numbers. I
> > measured the time each individual multiplication takes by storing a
> > timestamp when the multiplication is scheduled, and another when it
> > finishes.
>
> Here is another plot which also shows when each multiplication is
> started and how long it takes.

I agree with Mikkel that it seems to make sense that it looks this way.
But of course we would have been happier if the first multiplication did
not have to wait for so long. In particular it seems it is waiting longer
the more multiplications you ask for in total, right?
This is certainly something we don't want. I don't nearly enough about
how this works to say what to do about it..

regards, Ivan

___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] What to benchmark

2008-09-24 Thread Mikkel Krøigård
Citat Martin Geisler <[EMAIL PROTECTED]>:

> I've looked at the GMPY code, and it is a fairly straightforward
> wrapper for the GMP library, as you describe.
>
> But I don't know if it makes it easier for us to benchmark just
> because it is split into its own C code...
I never said it would. If you use this approach, it is easy to see how much is
spent on the dangerous arithmetic, but I guess a profiler could tell you how
much time Python spends on the functions implementing the operators anyway.

It is not completely unimaginable, however, that someone would want to know how
much actually goes on inside gmpy (arithmetic on big numbers, the data) and how
much goes on outside (counting variables, various kinds of overhead).


___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] What to benchmark

2008-09-24 Thread Martin Geisler
Mikkel Krøigård <[EMAIL PROTECTED]> writes:

> I remember trying out how to implement Python modules in C, and you
> needed to define special functions that map to C functions.
> Presumably there is something of the same kind going on inside gmpy
> that we can measure separately from the rest of the Python code. I
> am not familiar with the profilers though, and I could be wrong.

I've looked at the GMPY code, and it is a fairly straightforward
wrapper for the GMP library, as you describe.

But I don't know if it makes it easier for us to benchmark just
because it is split into its own C code...

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] What to benchmark

2008-09-24 Thread Sigurd Torkel Meldgaard
I have made a quick change of the field-implementation so that it uses
gmpy instead of internal longs, and it seems that for integers in the
interval we usually use (~64 bits) it actually runs a bit slower (a
equality test took 808 ms. instead of 767), I made a run with a modulo
with 300 bits, and gmpy showed some speedup (965 ms. against 1199
ms.).
All where tests done locally on camel12.

I attach the patch if you want to play with it - maybe there is a much
better way to do the interfacing.

Sigurd


gmpy
Description: Binary data
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] What to benchmark

2008-09-24 Thread ivan
Quoting Mikkel Krøigård <[EMAIL PROTECTED]>:

> Citat [EMAIL PROTECTED]:
>
> > I think there was earlier some version where arithmetic was done by calling
> > some
> > external C code.
> We are easily able to switch between gmpy (which is implemented in C) and
> Python
> arithmetic, if that's what you mean.

Well I guess that's part of what I meant. Certainly, if you can measure how much
time is spent inside the C code, this will say how much "raw" time is spent on
arithmetic. This assumes, however, that all arithmetic, even down to simple
additions are done this way. Then there are other things, such as computing
PRF's that I suppose is not done using gmp?. This would have to be measured
separately

>
> > Exercise: if you can measure 1) and 2), how do you measure 3)?  :-)
> That's one tough equation.
>

Many years of experience as a university teacher allows me to ask almost
impossible questions with surprising ease..

regards, Ivan

___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] What to benchmark

2008-09-24 Thread Martin Geisler
[EMAIL PROTECTED] writes:

> I think there was earlier some version where arithmetic was done by
> calling some external C code. From that I am guessing that it is
> feasible to make a version where all or most of the stuff in 1) is
> done by calling specific functions we can name and track rather than
> using the internal Python arithmetic, for instance. In such a
> version, it should be possible to find out how much time is spent on
> 1). If this gets much slower than the normal version, we are in
> trouble and then I don't know what to do.

Sigurd is actually testing this at this very moment (we talked about
it on IRC) and I hope he will give some benchmark results. This is
about using GMPY for field arithmetic:

  http://tracker.viff.dk/issue10

> 2) I suppose can be measured by hooking into the event loop of
> Twisted

That was what I described in the mail before -- I saw very few calls
to the select() function, which is the one used in the event loop to
sleep while waiting for data from a set of file descriptors.

> Exercise: if you can measure 1) and 2), how do you measure 3)?  :-)

Hehe :-)

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Some profiling results

2008-09-24 Thread Mikkel Krøigård
Citat Martin Geisler <[EMAIL PROTECTED]>:

> Martin Geisler <[EMAIL PROTECTED]> writes:
>
> > Hi everybody,
> >
> > I have done some testing and come up with some strange numbers. I
> > measured the time each individual multiplication takes by storing a
> > timestamp when the multiplication is scheduled, and another when it
> > finishes.
>
> Here is another plot which also shows when each multiplication is
> started and how long it takes.
>
>
I guess the first multiplication is so slow because you're busy scheduling the
rest. Notice that no multiplication actually finishes until they have all been
started. This diagram makes sense in my mind at least.

___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Some profiling results

2008-09-24 Thread Martin Geisler
Martin Geisler <[EMAIL PROTECTED]> writes:

> Hi everybody,
>
> I have done some testing and come up with some strange numbers. I
> measured the time each individual multiplication takes by storing a
> timestamp when the multiplication is scheduled, and another when it
> finishes.

Here is another plot which also shows when each multiplication is
started and how long it takes.

<>
-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] What to benchmark

2008-09-24 Thread Mikkel Krøigård
Citat [EMAIL PROTECTED]:

> I think there was earlier some version where arithmetic was done by calling
> some
> external C code.
We are easily able to switch between gmpy (which is implemented in C) and Python
arithmetic, if that's what you mean.

I remember trying out how to implement Python modules in C, and you needed to
define special functions that map to C functions. Presumably there is something
of the same kind going on inside gmpy that we can measure separately from the
rest of the Python code. I am not familiar with the profilers though, and I
could be wrong.

> Exercise: if you can measure 1) and 2), how do you measure 3)?  :-)
That's one tough equation.

___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] What to benchmark

2008-09-24 Thread ivan
Quoting Martin Geisler <[EMAIL PROTECTED]>:

> > Note that I am not saying we are in that situation, in fact I don't
> > think so - but I am saying that it is important to find out ASAP!
>
> Agreed! I would be very happy to hear suggestions as to how we can
> measure things in VIFF and/or Twisted.

Well, it seems to me it makes sense to split the time spent in 3 classes
1) necessary local computing (such as arithmetic on shares, computing PRF's
etc.)
2) idle time, while waiting for messages from the others
3) anything else

and the most basic information we want is how large these three are relative to
each other.

I think there was earlier some version where arithmetic was done by calling some
external C code. From that I am guessing that it is feasible to make a version
where all or most of the stuff in 1) is done by calling specific functions we
can name and track rather than using the internal Python arithmetic, for
instance. In such a version, it should be possible to find out
how much time is spent on 1). If this gets much slower than the normal version,
we are in trouble and then I don't know what to do.
2) I suppose can be measured by hooking into the event loop of Twisted

Exercise: if you can measure 1) and 2), how do you measure 3)?  :-)

regards, Ivan




___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Some profiling results

2008-09-24 Thread Martin Geisler
[EMAIL PROTECTED] writes:

> Hi Martin,
>
> I have a couple of stupid questions:
>
> Quoting Martin Geisler <[EMAIL PROTECTED]>:
>>
>> I've attached two plots, one for 1000 multiplications and one for
>> 4000. Each plot has the multiplication-number on the x-axis and the
>> time for that multiplication on the y-axis.
>
> If you have done 1000, resp. 4000 mult's, why do the x-axes start at
> 2000, reps. 8000?

Ah, good question: the numbers are taken from the current program
counter at the time when the multiplication is scheduled. And it turns
out to start at about 2n since we start by doing 2n shamir secret
sharings to get n pairs.

> And if you have measured time for individual multiplications, why
> are the numbers on the y-axis smaller in the 1000 multiplication
> case? Shouldn't they take about the same amount of time in both
> cases?

Yes, that was what I expected too! I would at least expect the final
multiplications to take about equally long, even if the first one are
waiting longer when doing 4000 multiplications.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] What to benchmark

2008-09-24 Thread Martin Geisler
[EMAIL PROTECTED] writes:

Hi!

> I think that looking at the two bottlenecks as suggested is a very
> good idea and should have the highest priority. We don't know nearly
> enough about how much time Python in general and Twisted in
> particular is stealing from us. In principle, we could be in a
> situation where it is hopeless to get the performance we should be
> able to get unless we ditch Python and implement Twisted ourselves
> in C++.

Yes, we need a much better understanding of what is going on. Only
then can be start talking about what kind of performance it is
realistic to hope for, and then look at how to achieve it.

> Note that I am not saying we are in that situation, in fact I don't
> think so - but I am saying that it is important to find out ASAP!

Agreed! I would be very happy to hear suggestions as to how we can
measure things in VIFF and/or Twisted.

I have tried looking at the idle time in the reactor (Twisted's event
loop) and found that it blocks on the select call only 40 times when
multiplying 4000. It starts by sleeps 20 ms in 23 select calls, and
then alternates between the following two steps:

* 315-317 multiplications finish
* sleeps 10-11 ms in 2 select call

I don't know yet what the number 315 corresponds to...

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Some profiling results

2008-09-24 Thread ivan
Hi Martin,

I have a couple of stupid questions:

Quoting Martin Geisler <[EMAIL PROTECTED]>:
>
> I've attached two plots, one for 1000 multiplications and one for
> 4000. Each plot has the multiplication-number on the x-axis and the
> time for that multiplication on the y-axis.

If you have done 1000, resp. 4000 mult's, why do the x-axes start at
2000, reps. 8000? And if you have measured time for individual multiplications,
why are the numbers on the y-axis smaller in the 1000 multiplication case?
Shouldn't they take about the same amount of time in both cases?

regards, Ivan
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


Re: [viff-devel] Some profiling results

2008-09-24 Thread Mikkel Krøigård
Citat Martin Geisler <[EMAIL PROTECTED]>:

> In both plots we see that the first multiplication takes very long, it
> is sort of waiting on the other multiplications. I think this is
> because we're not yielding to the reactor when we start all the
> multiplications.
>
> This also means that no network communication is started for the first
> multiplication until after all multiplications have been scheduled --
> this is actually not what we want...
>
> Here are the plots, please let me know what you think of this.

At a glance, it does look like the timing is being done correctly. Right now I
can only confirm that there's definitely something funny going on in those
plots.


___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk


[viff-devel] Some profiling results

2008-09-24 Thread Martin Geisler
Hi everybody,

I have done some testing and come up with some strange numbers. I
measured the time each individual multiplication takes by storing a
timestamp when the multiplication is scheduled, and another when it
finishes.

I've attached two plots, one for 1000 multiplications and one for
4000. Each plot has the multiplication-number on the x-axis and the
time for that multiplication on the y-axis.

In both plots we see that the first multiplication takes very long, it
is sort of waiting on the other multiplications. I think this is
because we're not yielding to the reactor when we start all the
multiplications.

This also means that no network communication is started for the first
multiplication until after all multiplications have been scheduled --
this is actually not what we want...

Here are the plots, please let me know what you think of this.

<><>

This is the change I made, in case someone wants to play around with
this:

diff --git a/viff/runtime.py b/viff/runtime.py
--- a/viff/runtime.py
+++ b/viff/runtime.py
@@ -49,6 +49,18 @@
 from twisted.internet.defer import maybeDeferred
 from twisted.internet.protocol import ReconnectingClientFactory, ServerFactory
 from twisted.protocols.basic import Int16StringReceiver
+
+PHASES = {}
+
+def begin(result, phase):
+PHASES[phase] = time.time()
+return result
+
+def end(result, phase):
+stop = time.time()
+start = PHASES.pop(phase, 0)
+print "Finished %s in %f sec" % (phase, stop - start)
+return result
 
 
 class Share(Deferred):
@@ -797,6 +809,9 @@
 result.addCallback(lambda a: a * share_b)
 return result
 
+phase = "mul-%d" % self.program_counter[0]
+begin(None, phase)
+
 # At this point both share_a and share_b must be Share
 # objects. So we wait on them, multiply and reshare.
 result = gather_shares([share_a, share_b])
@@ -804,6 +819,7 @@
 self.schedule_callback(result, self._shamir_share)
 self.schedule_callback(result, self._recombine,
threshold=2*self.threshold)
+result.addCallback(end, phase)
 return result
 
 @increment_pc



-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.
___
viff-devel mailing list (http://viff.dk/)
viff-devel@viff.dk
http://lists.viff.dk/listinfo.cgi/viff-devel-viff.dk