>Here is one set of test results.  This is between 2 2-processor systems
>with mthca.
>
>winverbs
>threads        1       2       4
>conn/sec       644     1294    1661
>accept 1363    1228    1566
>disc           814     715     706
>
>ibal
>threads        1       2       4
>conn/sec       1454    1854    1823
>accept 386     536     568
>disc           457     596     772
>
>So, at 4 threads, there's only about a 10% difference, but over a 100%
>difference for a single thread.  rdma_cmatose, which is single
>threaded,
>reports about 715 connections / second for winverbs.
>
>I'll try to see what the impact is of using work threads in the kernel
>driver.

I converted winverbs to use synchronous calls in the kernel.  The rates
were slightly lower.  There just isn't a lot of code in the winverbs ND
provider or winverbs library, so I'm not sure why the difference appears
so large.  I'll try to measure the socket calls used by winverbs to see
how much time is spent there.  That's really the only significant piece
of code I can see.

In any case, after looking closer at ndconn and what it does, I think we
just ignore the performance numbers that it reports.  The numbers are
not very reproducible or accurate.  Literally thousands of connections
can form outside of the timing of the test.

The ibal ND provider is likely to perform better than the winverbs ND
provider, since it pokes its fingers into the guts of the IB CM.  I
haven't noticed any application level difference yet.  If we use the 2
threaded numbers from above, then there would only be a 1 second
difference establishing connections across 4000 MPI ranks.
 
- Sean

_______________________________________________
ofw mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw

Reply via email to