>Here is one set of test results. This is between 2 2-processor systems >with mthca. > >winverbs >threads 1 2 4 >conn/sec 644 1294 1661 >accept 1363 1228 1566 >disc 814 715 706 > >ibal >threads 1 2 4 >conn/sec 1454 1854 1823 >accept 386 536 568 >disc 457 596 772 > >So, at 4 threads, there's only about a 10% difference, but over a 100% >difference for a single thread. rdma_cmatose, which is single >threaded, >reports about 715 connections / second for winverbs. > >I'll try to see what the impact is of using work threads in the kernel >driver.
I converted winverbs to use synchronous calls in the kernel. The rates were slightly lower. There just isn't a lot of code in the winverbs ND provider or winverbs library, so I'm not sure why the difference appears so large. I'll try to measure the socket calls used by winverbs to see how much time is spent there. That's really the only significant piece of code I can see. In any case, after looking closer at ndconn and what it does, I think we just ignore the performance numbers that it reports. The numbers are not very reproducible or accurate. Literally thousands of connections can form outside of the timing of the test. The ibal ND provider is likely to perform better than the winverbs ND provider, since it pokes its fingers into the guts of the IB CM. I haven't noticed any application level difference yet. If we use the 2 threaded numbers from above, then there would only be a 1 second difference establishing connections across 4000 MPI ranks. - Sean _______________________________________________ ofw mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
