Re: [Libevent-users] Bufferevent throughput experiments

2011-01-19 Thread Nick Mathewson
On Sun, Jan 16, 2011 at 4:39 PM, Marcel Roelofs
marcel.roel...@gmail.com wrote:

Hi, Marcel!

Thanks for the

I'd like to see a lot more optimization work in Libevent 2.1.  One
thing that this really needs IMO is more work on benchmarking and
profiling.  Chris Davis started work on a benchmarking tool last year;
his site seems to be down, so I've uploaded a copy to
git://github.com/nmathewson/libevent-bench.git .  I'd encourage
everybody else who'd like to work on Libevent performance to look into
using some reproducible and shareable benchmark mechanism (ideally, a
small program) so that we can try them out and see the impact of
different changes together.  This way, instead of just fixing
bottlenecks one by one, we can track performance over time and make
sure that we don't introduce performance regressions between releases.
,
 [...]
 So, I decided to do a little bit of profiling on my Windows box, and noticed a
 lot of time spent in grabbing the locks in read_complete and write_complete in
 bufferevent_async.c. After some further digging I noticed a couple of fixed
 buffer sizes limited to 16K and 4K in
   bufferevent_async.c
   bufferevent_ratelim.c
   buffer.c

Weird that this would have to do with buffer sizes; I would have
thought that the fixed maximum amount send/received per call to
wsasend/wsarecv would have hurt more.  (Actually, is that what you
mean?  The 16K in bufferevent_async is not a buffer size; it's a read
limit.)

Another possibility to look into is whether we can better tune the
rest of our application by looking into algorithms with less lock
contention, or twiddling the SPIN_COUNT of the locks on windows, or
what.

If possible, could you upload your profiles somewhere?  It would be
cool to see where else you found bottlenecks.

  [...]
 Now I'not saying that 64K in internal buffers would be a good price to pay for
 every type of connection, but I can imagine that when throughput is important
 it would be nice if one could change these fixed buffer sizes through an API.

If we have to do that, so be it.  But personally, I'd prefer to check
out if there isn't some way we can get the Libevent code to do the
right thing here more automatically.  After all, nearly all users
would rather just leave the buffer sizes alone, so if we can find some
way to get better performance without forcing developers to mess with
magic numbers, that would make for an even easier-to-use library.

yrs,
-- 
Nick
___
Libevent-users mailing list
Libevent-users@monkey.org
http://lists.monkey.org:8080/listinfo/libevent-users


[Libevent-users] Bufferevent throughput experiments

2011-01-16 Thread Marcel Roelofs
I was interested in the max attainable throughput of libevent when using 
bufferevents, so I modified the le-proxy sample into a small performance test 
program, pushing 64K buffers as fast as possible over a single full duplex 
connection. 

I was a little bit disappointed by the results initially when running the test 
on localhost, compared to doing the same using a simple program using select 
(and large buffers in the read call!) directly: 

Linux (virtual machine) : 380 Mb/s  (cf. 1.45 Gb/s using select directly)
Windows :  82 Mb/s  (cf.  100 Mb/s using select directly)
Windows (IOCP)  :  93 Mb/s

So, I decided to do a little bit of profiling on my Windows box, and noticed a 
lot of time spent in grabbing the locks in read_complete and write_complete in 
bufferevent_async.c. After some further digging I noticed a couple of fixed 
buffer sizes limited to 16K and 4K in 
   bufferevent_async.c 
   bufferevent_ratelim.c 
   buffer.c 
(some of which were marked FIXME, so I must be onto something ;)

When I changed all sizes to 64K, this gave quite a bit of improvement:

Linux (virtual machine) : 1220 Mb/s
Windows (select):   96 Mb/s
Windows (IOCP)  :  125 Mb/s

That's pretty close to the numbers when using select directly, and exceeding 
that when using IOCP on Windows. In the latter case, the time spent in 
acquiring the locks was also considerably less. 

Some further experiments, playing with both the buffer sizes used by my test 
program as well as the internal libevent buffer sizes, seems to indicate that 
64K gives the best performance throughout.

Now I'not saying that 64K in internal buffers would be a good price to pay for 
every type of connection, but I can imagine that when throughput is important 
it would be nice if one could change these fixed buffer sizes through an API.

Any thoughts?

Cheers,
Marcel Roelofs

___
Libevent-users mailing list
Libevent-users@monkey.org
http://lists.monkey.org:8080/listinfo/libevent-users