Re: cprng_fast implementation benchmarks

Thor Lancelot Simon Wed, 23 Apr 2014 07:00:28 -0700

On Wed, Apr 23, 2014 at 02:34:33PM +0100, Mindaugas Rasiukevicius wrote:
> Thor Lancelot Simon <t...@panix.com> wrote:
> > On Wed, Apr 23, 2014 at 01:16:42PM +0100, Mindaugas Rasiukevicius wrote:
> > > 
> > > You mentioned that 4GB of data are generated by requesting 256 bytes.
> > > It would be more interesting to see the throughput of 4 byte requests
> > > i.e. how many cprng_fast32() calls per second can we do?
> > 
> > That is how the numbers in the "cpb" column were generated: I modified
> > the kernel code so that after each rekeying, it calls cprng_fast32()
> > 1,000,000 times in a tight loop to generate 4,000,000 bytes.
> 
> OK.  Can you provide the time in seconds?  Having throughput per second
> is useful for general contemplation and comparison with the rates other
> subsystems can achieve.


This CPU has a 2.4GHz clock, so we'd be looking at 109 MB/sec for chacha8
called via cprng_fast32(), assuming I counted the zeroes correctly.

Interestingly according to the rightmost column of the data at
http://bench.cr.yp.to/results-stream.html our implementation is about as
good as anyone else's, for these inconveniently short requests.  In the
tables at that URL it is easy to see the overhead come to predominate as
the requests get shorter -- I bet it might not even help if we 
buffered aggressively, since I bet the copies would be quite costly.

However, I'm a little suspicious of the test framework they used since
it shows truly immense overhead for short requests on MIPS (Loongson)
and ARM while x86 and PPC seem fine -- I wonder if it is doing something
dumb like calling the core transform through a function pointer.  If we
really want to be sure we'll need to do similar tests ourselves in-kernel.

Thor

Re: cprng_fast implementation benchmarks

Reply via email to