On Thu, 2005-03-31 at 16:41, Libor Michalek wrote: > On Thu, Mar 31, 2005 at 04:25:28PM -0500, Hal Rosenstock wrote: > > On Wed, 2005-03-30 at 19:43, Libor Michalek wrote: > > > The program has a decent help for available parameters, but here are > > > some reasonable defaults: > > > > > > server: > > > > > > ./ttcp.aio.x -r -l 65536 -a 20 > > > > > > client: > > > > > > ./ttcp.aio.x -t -l 65536 -n 100000 -a 20 192.168.0.100 > > > > Are these the parameters used to achieve the throughput numbers you > > published ? > > > > Sounds like you tweaked the numbers in sdp_dev.h. Anywhere else ? > > > > Can you provide the tuning numbers used and where they were found so these > > results can be reproduced ? > > No tweaking or changes to the SDP code itself. The parameters above > should give similar results, but here are the exact parameters I used > for the two aync tests I mentioned in the original results I posted. > > > > For async socket I kept 20 96K buffers in flight. For the FMR pool cache > > > hit async results I used only 20 different buffers. > > ./ttcp.aio.x -r -l 98304 -a 20 -f M > ./ttcp.aio.x -t -l 98304 -n 200000 -a 20 -f M 192.168.0.100 > > > > For the FMR pool cache miss async results I used 1000 different > > > buffers, of which only 20 were in flight at a time. > > ./ttcp.aio.x -r -l 98304 -a 20 -x 1000 -f M > ./ttcp.aio.x -t -l 98304 -n 200000 -a 20 -x 1000 -f M 192.168.0.100
We are seeing issues with both buffer size and iterations. We get back -ENOMEM and also see VMA lock errors. Are the 2 related ? Should we turn on SDP debug to see what specifically can't be allocated ? In that case, what could be done ? When using the default parameters, we see the following: On the server: [EMAIL PROTECTED] ~]# ./ttcp.aio.x -r -l 65536 -a 20 ttcp-r: buflen = 65536 nbuf = 0 align = 16384/0 port = 5001 ttcp-r: socket ttcp-r: accept from 192.168.1.4 ttcp-r: Event error <-12> <5275648> ttcp-r: 0 bytes in 0.00 real seconds = 0.00 Mbit/sec +++ ttcp-r: 2 I/O calls, usec/call = 112.00, calls/sec = 8928.57 ttcp-r: user: 0 sys: 0 total: 0 real: 224 (microseconds) [EMAIL PROTECTED] ~]# On the client: [EMAIL PROTECTED] ~]# ./ttcp.aio.x -t -l 65536 -n 100000 -a 20 192.168.1.3 ttcp-t: buflen = 65536 nbuf = 100000 align = 16384/0 port = 5001 192.168.1.3 ttcp-t: socket ttcp-t: connect ttcp-t: Event error <-12> <5275648> ttcp-t: 0 bytes in 0.00 real seconds = 0.00 Mbit/sec +++ ttcp-t: 2 I/O calls, usec/call = 83.00, calls/sec = 12048.19 ttcp-t: user: 0 sys: 0 total: 0 real: 166 (microseconds) [EMAIL PROTECTED] ~]# Here's the output from the dmesg on the server: ERR: : VMA lock <620000:65536> error <-12> <16:0:8> ERR: : VMA lock <634000:65536> error <-12> <16:0:8> ERR: : VMA lock <648000:65536> error <-12> <16:0:8> ...<repeats>... Here's the output from the dmesg (client): ERR: : VMA lock <580000:65536> error <-12> <16:0:8> ERR: : VMA lock <594000:65536> error <-12> <16:0:8> ERR: : VMA lock <5a8000:65536> error <-12> <16:0:8> ...<repeats>... If the value of -l (length of network read/write buffers) it runs (up to buffer size of 4K). However, there still is dmesg output on the server side: Here's the output from the dmesg on the server: ERR: : VMA lock <550000:1024> error <-12> <1:8:8> ERR: : VMA lock <554000:1024> error <-12> <1:8:8> ERR: : VMA lock <558000:1024> error <-12> <1:8:8> WARN: : Cancel read with no IOCB. <2:0:00000005> WARN: : Cancel read with no IOCB. <2:0:00000005> ERR: : VMA lock <528000:1024> error <-12> <1:8:8> ERR: : VMA lock <52c000:1024> error <-12> <1:8:8> ...<repeats>... Is this related to system configuration somehow ? How much system memory in your machines ? Is this a factor ? Thanks. -- Hal _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
