Re: zfs-related(?) panic in cache_enter: wrong vnode type
On Wed, Dec 07, 2011 at 06:50:35PM +0200, Andriy Gapon wrote: (kgdb) bt #0 doadump (textdump=1) at pcpu.h:224 #1 0x804f6d3b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:447 #2 0x804f63e9 in panic (fmt=0x104 Address 0x104 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:635 #3 0x80585f46 in cache_enter (dvp=0xfe003d4763c0, vp=0xfe0142517000, cnp=0xff82393b3708) at /usr/src/sys/kern/vfs_cache.c:726 #4 0x81a90900 in zfs_lookup (dvp=0xfe003d4763c0, nm=0xff82393b3140 .., vpp=0xff82393b36e0, cnp=0xff82393b3708, nameiop=0, cr=0xfe0042e88100, td=0xfe000fdfa480, flags=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1470 Which FreeBSD version is it? Because lines doesn't seem to match neither to HEAD nor to stable/8. It would be best of you could send me few lines around zfs_vnops.c:1470 and vfs_cache.c:726. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpI7RzOpvuwY.pgp Description: PGP signature
Re: Dog Food tm
On Tue, 06 Dec 2011 13:51:05 -0800, Sean Bruno wrote: Was trying to use gmirror(4) or zfs(4) today to get a machine in the cluster setup with s/w raid and was completely flummoxed by the intricacies of manual setup. Chances are, I just am not smart enough to wind my way though the various how tos and wiki pages that I've been browsing to get the job done. Why using gmirror under zfs, when zfs itself supports software raid? If someone wants to work on modifying bsdinstaller to do s/w raid via one of these mechanisms, clusteradm@ can provide you a two disk SATA machine that can be used for this purpose. Sean -- Kind regards Daniel ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: zfs-related(?) panic in cache_enter: wrong vnode type
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 on 08/12/2011 10:13 Pawel Jakub Dawidek said the following: On Wed, Dec 07, 2011 at 06:50:35PM +0200, Andriy Gapon wrote: (kgdb) bt #0 doadump (textdump=1) at pcpu.h:224 #1 0x804f6d3b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:447 #2 0x804f63e9 in panic (fmt=0x104 Address 0x104 out of bounds) at /usr/src/sys/kern/kern_shutdown.c:635 #3 0x80585f46 in cache_enter (dvp=0xfe003d4763c0, vp=0xfe0142517000, cnp=0xff82393b3708) at /usr/src/sys/kern/vfs_cache.c:726 #4 0x81a90900 in zfs_lookup (dvp=0xfe003d4763c0, nm=0xff82393b3140 .., vpp=0xff82393b36e0, cnp=0xff82393b3708, nameiop=0, cr=0xfe0042e88100, td=0xfe000fdfa480, flags=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1470 Which FreeBSD version is it? Because lines doesn't seem to match neither to HEAD nor to stable/8. It would be best of you could send me few lines around zfs_vnops.c:1470 and vfs_cache.c:726. It's recent svn head, r228017, just with some local unrelated modifications. 1458 #ifdef FREEBSD_NAMECACHE 1459 /* 1460 * Insert name into cache (as non-existent) if appropriate. 1461 */ 1462 if (error == ENOENT (cnp-cn_flags MAKEENTRY) nameiop != CREATE) 1463 cache_enter(dvp, *vpp, cnp); 1464 /* 1465 * Insert name into cache if appropriate. 1466 */ 1467 if (error == 0 (cnp-cn_flags MAKEENTRY)) { 1468 if (!(cnp-cn_flags ISLASTCN) || 1469 (nameiop != DELETE nameiop != RENAME)) { 1470 cache_enter(dvp, *vpp, cnp); 1471 } 1472 } 1473 #endif 716 if (flag == NCF_ISDOTDOT) { 717 /* 718 * See if we are trying to add .. entry, but some other lookup 719 * has populated v_cache_dd pointer already. 720 */ 721 if (dvp-v_cache_dd != NULL) { 722 CACHE_WUNLOCK(); 723 cache_free(ncp); 724 return; 725 } 726 KASSERT(vp == NULL || vp-v_type == VDIR, 727 (wrong vnode type %p, vp)); 728 dvp-v_cache_dd = ncp; 729 } - -- Andriy Gapon -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.18 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO4IIgAAoJEHSlLSemUf4v5v0H/RqN3RagBBleHYZTY+39gxj3 I2urnYxB+1vi2b9o/B4KRQXi5gByAY3sukDnKrABxXK3ycOOJjQ5o8Xz0NMqcwHY r6oMnh/4NLpZi+Cwx0LQKycRZuPMsKzYJpMrofE/Q9Nl1EgUjz6Er2fOaEiu1xUO DAWKrJdgFNE3Kwjy64oqftcC9Aw9g0+lcyVp+Pzw9HRHzw1h8wokH+EvslfFqtJ0 YFCabxTxNQrqpCgv8PEEAAyNiqB7S9X43f5RKNuMFOiwGp8GsP7O6j9AFDUoM9os +KkkN0hE28jB4daQ0HJpPr9Kz3Mkr91yMT+nxMv1lPH9BmGYWwbx3Jck0c7o+OY= =8Y/W -END PGP SIGNATURE- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?
On 07.12.11 22:23, Luigi Rizzo wrote: Sorry, forgot to mention that the above is with TSO DISABLED (which is not the default). TSO seems to have a very bad interaction with HWCSUM and non-zero mitigation. I have this on both sender and receiver # ifconfig ix1 ix1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=4bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO ether 00:25:90:35:22:f1 inet 10.2.101.11 netmask 0xff00 broadcast 10.2.101.255 media: Ethernet autoselect (autoselect full-duplex) status: active without LRO on either end # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.051 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 1802.4049 MB in 5.06 real seconds = 365077.76 KB/sec = 2990.7170 Mbps nuttcp-t: host-retrans = 0 nuttcp-t: 28839 I/O calls, msec/call = 0.18, calls/sec = 5704.44 nuttcp-t: 0.0user 4.5sys 0:05real 90% 108i+1459d 630maxrss 0+2pf 87706+1csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 1802.4049 MB in 5.18 real seconds = 356247.49 KB/sec = 2918.3794 Mbps nuttcp-r: 529295 I/O calls, msec/call = 0.01, calls/sec = 102163.86 nuttcp-r: 0.1user 3.7sys 0:05real 73% 116i+1567d 618maxrss 0+15pf 230404+0csw with LRO on receiver # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.067 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 2420.5000 MB in 5.02 real seconds = 493701.04 KB/sec = 4044.3989 Mbps nuttcp-t: host-retrans = 2 nuttcp-t: 38728 I/O calls, msec/call = 0.13, calls/sec = 7714.08 nuttcp-t: 0.0user 4.1sys 0:05real 83% 107i+1436d 630maxrss 0+2pf 4896+0csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 2420.5000 MB in 5.15 real seconds = 481679.37 KB/sec = 3945.9174 Mbps nuttcp-r: 242266 I/O calls, msec/call = 0.02, calls/sec = 47080.98 nuttcp-r: 0.0user 2.4sys 0:05real 49% 112i+1502d 618maxrss 0+15pf 156333+0csw About 1/4 improvement... With LRO on both sender and receiver # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.049 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 2585.7500 MB in 5.02 real seconds = 527402.83 KB/sec = 4320.4840 Mbps nuttcp-t: host-retrans = 1 nuttcp-t: 41372 I/O calls, msec/call = 0.12, calls/sec = 8240.67 nuttcp-t: 0.0user 4.6sys 0:05real 93% 106i+1421d 630maxrss 0+2pf 4286+0csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 2585.7500 MB in 5.15 real seconds = 514585.31 KB/sec = 4215.4829 Mbps nuttcp-r: 282820 I/O calls, msec/call = 0.02, calls/sec = 54964.34 nuttcp-r: 0.0user 2.7sys 0:05real 55% 114i+1540d 618maxrss 0+15pf 188794+147csw Even better... With LRO on sender only: # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.054 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 2077.5437 MB in 5.02 real seconds = 423740.81 KB/sec = 3471.2847 Mbps nuttcp-t: host-retrans = 0 nuttcp-t: 33241 I/O calls, msec/call = 0.15, calls/sec = 6621.01 nuttcp-t: 0.0user 4.5sys 0:05real 92% 109i+1468d 630maxrss 0+2pf 49532+25csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 2077.5437 MB in 5.15 real seconds = 413415.33 KB/sec = 3386.6984 Mbps nuttcp-r: 531979 I/O calls, msec/call = 0.01, calls/sec = 103378.67 nuttcp-r: 0.0user 4.5sys 0:05real 88% 110i+1474d 618maxrss 0+15pf 117367+0csw also remember that hw.ixgbe.max_interrupt_rate has only effect at module load -- i.e. you set it with the bootloader, or with kenv before loading the module. I have this in /boot/loader.conf kern.ipc.nmbclusters=512000 hw.ixgbe.max_interrupt_rate=0 on both sender and receiver. Please retry the measurements disabling tso (on both sides, but it really matters only on the sender). Also, LRO requires HWCSUM. How do I set HWCSUM? Is this different
Re: Dog Food tm
On 12/08/2011 09:35 AM, Daniel Gerzo wrote: On Tue, 06 Dec 2011 13:51:05 -0800, Sean Bruno wrote: Was trying to use gmirror(4) or zfs(4) today to get a machine in the cluster setup with s/w raid and was completely flummoxed by the intricacies of manual setup. Chances are, I just am not smart enough to wind my way though the various how tos and wiki pages that I've been browsing to get the job done. Why using gmirror under zfs, when zfs itself supports software raid? If someone wants to work on modifying bsdinstaller to do s/w raid via one of these mechanisms, clusteradm@ can provide you a two disk SATA machine that can be used for this purpose. Sean And what problems did you run into? This guide worked for me: http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror (but the zfs create ... part was too much typing, so I did it with a script that I added to the CD) Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.malo...@brockmann-consult.de Internet: http://www.brockmann-consult.de ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?
On Thu, Dec 08, 2011 at 12:06:26PM +0200, Daniel Kalchev wrote: On 07.12.11 22:23, Luigi Rizzo wrote: Sorry, forgot to mention that the above is with TSO DISABLED (which is not the default). TSO seems to have a very bad interaction with HWCSUM and non-zero mitigation. I have this on both sender and receiver # ifconfig ix1 ix1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=4bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO ether 00:25:90:35:22:f1 inet 10.2.101.11 netmask 0xff00 broadcast 10.2.101.255 media: Ethernet autoselect (autoselect full-duplex) status: active without LRO on either end # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.051 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 1802.4049 MB in 5.06 real seconds = 365077.76 KB/sec = 2990.7170 Mbps nuttcp-t: host-retrans = 0 nuttcp-t: 28839 I/O calls, msec/call = 0.18, calls/sec = 5704.44 nuttcp-t: 0.0user 4.5sys 0:05real 90% 108i+1459d 630maxrss 0+2pf 87706+1csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 1802.4049 MB in 5.18 real seconds = 356247.49 KB/sec = 2918.3794 Mbps nuttcp-r: 529295 I/O calls, msec/call = 0.01, calls/sec = 102163.86 nuttcp-r: 0.1user 3.7sys 0:05real 73% 116i+1567d 618maxrss 0+15pf 230404+0csw with LRO on receiver # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.067 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 2420.5000 MB in 5.02 real seconds = 493701.04 KB/sec = 4044.3989 Mbps nuttcp-t: host-retrans = 2 nuttcp-t: 38728 I/O calls, msec/call = 0.13, calls/sec = 7714.08 nuttcp-t: 0.0user 4.1sys 0:05real 83% 107i+1436d 630maxrss 0+2pf 4896+0csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 2420.5000 MB in 5.15 real seconds = 481679.37 KB/sec = 3945.9174 Mbps nuttcp-r: 242266 I/O calls, msec/call = 0.02, calls/sec = 47080.98 nuttcp-r: 0.0user 2.4sys 0:05real 49% 112i+1502d 618maxrss 0+15pf 156333+0csw About 1/4 improvement... With LRO on both sender and receiver # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.049 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 2585.7500 MB in 5.02 real seconds = 527402.83 KB/sec = 4320.4840 Mbps nuttcp-t: host-retrans = 1 nuttcp-t: 41372 I/O calls, msec/call = 0.12, calls/sec = 8240.67 nuttcp-t: 0.0user 4.6sys 0:05real 93% 106i+1421d 630maxrss 0+2pf 4286+0csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 2585.7500 MB in 5.15 real seconds = 514585.31 KB/sec = 4215.4829 Mbps nuttcp-r: 282820 I/O calls, msec/call = 0.02, calls/sec = 54964.34 nuttcp-r: 0.0user 2.7sys 0:05real 55% 114i+1540d 618maxrss 0+15pf 188794+147csw Even better... With LRO on sender only: # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.054 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 2077.5437 MB in 5.02 real seconds = 423740.81 KB/sec = 3471.2847 Mbps nuttcp-t: host-retrans = 0 nuttcp-t: 33241 I/O calls, msec/call = 0.15, calls/sec = 6621.01 nuttcp-t: 0.0user 4.5sys 0:05real 92% 109i+1468d 630maxrss 0+2pf 49532+25csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 2077.5437 MB in 5.15 real seconds = 413415.33 KB/sec = 3386.6984 Mbps nuttcp-r: 531979 I/O calls, msec/call = 0.01, calls/sec = 103378.67 nuttcp-r: 0.0user 4.5sys 0:05real 88% 110i+1474d 618maxrss 0+15pf 117367+0csw also remember that hw.ixgbe.max_interrupt_rate has only effect at module load -- i.e. you set it with the bootloader, or with kenv before loading the module. I have this in /boot/loader.conf kern.ipc.nmbclusters=512000 hw.ixgbe.max_interrupt_rate=0 on both sender and receiver.
Re: Burning CDs and DVDs on SATA drive in FreeBSD 9.0
from my last message: I can't get cdrtools (cdrecord or readcd) to work on FreeBSD 9.0-RC2, and now I see RC3 is available. I tried options ATA_CAM in kernel config, removing device atapicam, but still readcd -scanbus or cdrecord -scanbus refuses to work, running as root. camcontrol devlist shows my DVD drive, and I am able to mount and read /dev/cd0. Is cdrtools buggy, or maybe the FreeBSD port is buggy? Daniel O'Connor docon...@gsoft.com.au responded: Define refuses to work.. I have an oldish 9.0-CURRENT which works with cdrecord.. [titus 1:16] ~ cdrecord -scanbus Cdrecord-ProDVD-ProBD-Clone 3.00 (amd64-unknown-freebsd9.0) Copyright (C) 1995-2010 J?rg Schilling Using libscg version 'schily-0.9'. scsibus1: 1,0,0 100) 'PIONEER ' 'DVD-RW DVR-112D' '1.09' Removable CD-ROM 1,1,0 101) * 1,2,0 102) * 1,3,0 103) * 1,4,0 104) * 1,5,0 105) * 1,6,0 106) * 1,7,0 107) * cdrecord -scanbus produces cdrecord: Inappropriate ioctl for device. CAMIOCOMMAND ioctl failed. Cannot open or use SCSI driver. cdrecord: For possible targets try 'cdrecord -scanbus'. cdrecord: For possible transport specifiers try 'cdrecord dev=help'. Cdrecord-ProDVD-ProBD-Clone 3.00 (amd64-unknown-freebsd9.0) Copyright (C) 1995-2010 Jörg Schilling About the same with readcd, and cdrecord dev=help is no help. from grarpamp grarp...@gmail.com: In the past, I've used the ftp cdrtools pkg (made from the port of course) and it failed to work. It's a popular tool so my machine was probably out of sync. Same with burncd. However, compiling the current cdrtools source worked fine. So I'd try that first, compare, and send up a bug if need be. Try to skip the scan by specifying the BTL or devpath on the command line. The scan is a big part of the port and might have breakage, at least for the app below. Also, if you're doing audio, someone over on ports has said they're doing an update to cdparanoia. It's minor, but useful for that crowd. makefs and burncd are part of the base, at least on RELENG_8. And makefs is used in the official releases. So they should just work. Good luck. You mean build cdrtools, or possibly cdrkit, directly from the source outside the FreeBSD ports collection. I could then see if I could make that into my own port, or else install to prefix /usr/local2. If there is a conflict between cdrkit and cdrtools, maybe install the other to prefix /usr/local3? Then I could spawn a subshell with /usr/local2/bin or /usr/local3/bin added to the PATH. If I get something to work, I could report back to this ports emailing list, share the benefits with others. Tom ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?
On 12/08/11 05:08, Luigi Rizzo wrote: On Wed, Dec 07, 2011 at 11:59:43AM +0100, Andre Oppermann wrote: On 06.12.2011 22:06, Luigi Rizzo wrote: ... Even in my experiments there is a lot of instability in the results. I don't know exactly where the problem is, but the high number of read syscalls, and the huge impact of setting interrupt_rate=0 (defaults at 16us on the ixgbe) makes me think that there is something that needs investigation in the protocol stack. Of course we don't want to optimize specifically for the one-flow-at-10G case, but devising something that makes the system less affected by short timing variations, and can pass upstream interrupt mitigation delays would help. I'm not sure the variance is only coming from the network card and driver side of things. The TCP processing and interactions with scheduler and locking probably play a big role as well. There have been many changes to TCP recently and maybe an inefficiency that affects high-speed single sessions throughput has crept in. That's difficult to debug though. I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which seems slightly faster than HEAD) using MTU=1500 and various combinations of card capabilities (hwcsum,tso,lro), different window sizes and interrupt mitigation configurations. default latency is 16us, l=0 means no interrupt mitigation. lro is the software implementation of lro (tcp_lro.c) hwlro is the hardware one (on 82599). Using a window of 100 Kbytes seems to give the best results. Summary: [snip] - enabling software lro on the transmit side actually slows down the throughput (4-5Gbit/s instead of 8.0). I am not sure why (perhaps acks are delayed too much) ? Adding a couple of lines in tcp_lro to reject pure acks seems to have much better effect. The tcp_lro patch below might actually be useful also for other cards. --- tcp_lro.c (revision 228284) +++ tcp_lro.c (working copy) @@ -245,6 +250,8 @@ ip_len = ntohs(ip-ip_len); tcp_data_len = ip_len - (tcp-th_off 2) - sizeof (*ip); + if (tcp_data_len == 0) + return -1; /* not on ack */ /* There is a bug with our LRO implementation (first noticed by Jeff Roberson) that I started fixing some time back but dropped the ball on. The crux of the problem is that we currently only send an ACK for the entire LRO chunk instead of all the segments contained therein. Given that most stacks rely on the ACK clock to keep things ticking over, the current behaviour kills performance. It may well be the cause of the performance loss you have observed. WIP patch is at: http://people.freebsd.org/~lstewart/patches/misctcp/tcplro_multiack_9.x.r219723.patch Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have LRO capable hardware setup locally to figure out what I've missed. Most of the machines in my lab are running em(4) NICs which don't support LRO, but I'll see if I can find something which does and perhaps resurrect this patch. If anyone has any ideas what I'm missing in the patch to make it work, please let me know. Cheers, Lawrence ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [RFC] VIA south bridge watchdog support
Hi, If someone want to look at I've added support for VIA south bridge watchdog. It has been tested on VX900 but should works with VX800, VX855, CX700. http://people.freebsd.org/~fabient/patch-watchdog-via-rev1 Posted rev2 : http://people.freebsd.org/~fabient/patch-watchdog-via-rev2 - add / test VT8251 - man page update - reset FIRED bit when detected Maybe other south bridge supported but feedback is needed. Fabien___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Burning CDs and DVDs on SATA drive in FreeBSD 9.0
On 12/08/2011 01:25, Thomas Mueller wrote: I can't get cdrtools (cdrecord or readcd) to work on FreeBSD 9.0-RC2, and now I see RC3 is available. I tried options ATA_CAM in kernel config, removing device atapicam, but still readcd -scanbus or cdrecord -scanbus refuses to work, running as root. camcontrol devlist shows my DVD drive, and I am able to mount and read /dev/cd0. Is cdrtools buggy, or maybe the FreeBSD port is buggy? I see from the Makefile that sysutils/cdrdao is broken on FreeBSD 9.x, does not link. I downloaded cdrkit-1.1.11.tar.gz so as to extract and have a look at it, and possibly build it on my own, outside the ports system. I see the version in ports system is 1.1.9, maybe 1.1.11 could do better? Maybe the FreeBSD developers were too hasty to deprecate burncd? Recompile the port; the CAM ioctl numbers have changed. Cheers Michiel ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: datapoints on 10G throughput with TCP ?
On Mon, Dec 05, 2011 at 08:27:03PM +0100, Luigi Rizzo wrote: Hi, I am trying to establish the baseline performance for 10G throughput over TCP, and would like to collect some data points. As a testing program i am using nuttcp from ports (as good as anything, i guess -- it is reasonably flexible, and if you use it in TCP with relatively large writes, the overhead for syscalls and gettimeofday() shouldn't kill you). I'd be very grateful if you could do the following test: - have two machines connected by a 10G link - on one run nuttcp -S - on the other one run nuttcp -t -T 5 -w 128 -v the.other.ip and send me a dump of the output, such as the one(s) at the end of the message. I am mostly interested in two configurations: - one over loopback, which should tell how fast is the CPU+memory As an example, one of my machines does about 15 Gbit/s, and one of the faster ones does about 44 Gbit/s - one over the wire using 1500 byte mss. Here it really matters how good is the handling of small MTUs. As a data point, on my machines i get 2..3.5 Gbit/s on the slow machine with a 1500 byte mtu and default card setting. Clearing the interrupt mitigation register (so no mitigation) brings the rate to 5-6 Gbit/s. Same hardware with linux does about 8 Gbit/s. HEAD seems 10-20% slower than RELENG_8 though i am not sure who is at fault. The receive side is particularly critical - on FreeBSD the receiver is woken up every two packets (do the math below, between the number of rx calls and throughput and mss), resulting in almost 200K activations per second, and despite the fact that interrupt mitigation is set to a much lower value (so incoming packets should be batched). On linux, i see much fewer reads, presumably the process is woken up only at the end of a burst. About relative performance FreeBSD and Linux I wrote in -performance@ at Jan'11 (Interrupt performance) EXAMPLES OF OUTPUT -- nuttcp -t -T 5 -w 128 -v 10.0.1.2 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.0.1.2 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.0.1.2 with mss=1460, RTT=0.103 ms nuttcp-t: send window size = 131400, receive window size = 65700 nuttcp-t: 3095.0982 MB in 5.00 real seconds = 633785.85 KB/sec = 5191.9737 Mbps nuttcp-t: host-retrans = 0 nuttcp-t: 49522 I/O calls, msec/call = 0.10, calls/sec = 9902.99 nuttcp-t: 0.0user 2.7sys 0:05real 54% 100i+2639d 752maxrss 0+3pf 258876+6csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.0.1.4 nuttcp-r: send window size = 33580, receive window size = 131400 nuttcp-r: 3095.0982 MB in 5.17 real seconds = 613526.42 KB/sec = 5026.0084 Mbps nuttcp-r: 1114794 I/O calls, msec/call = 0.00, calls/sec = 215801.03 nuttcp-r: 0.1user 3.5sys 0:05real 69% 112i+1104d 626maxrss 0+15pf 507653+188csw nuttcp -t -T 5 -w 128 -v localhost nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - localhost nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 127.0.0.1 with mss=14336, RTT=0.051 ms nuttcp-t: send window size = 143360, receive window size = 71680 nuttcp-t: 26963.4375 MB in 5.00 real seconds = 5521440.59 KB/sec = 45231.6413 Mbps nuttcp-t: host-retrans = 0 nuttcp-t: 431415 I/O calls, msec/call = 0.01, calls/sec = 86272.51 nuttcp-t: 0.0user 4.6sys 0:05real 93% 102i+2681d 774maxrss 0+3pf 2510+1csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 127.0.0.1 nuttcp-r: send window size = 43008, receive window size = 143360 nuttcp-r: 26963.4375 MB in 5.20 real seconds = 5313135.74 KB/sec = 43525.2080 Mbps nuttcp-r: 767807 I/O calls, msec/call = 0.01, calls/sec = 147750.09 nuttcp-r: 0.1user 3.9sys 0:05real 79% 98i+2570d 772maxrss 0+16pf 311014+8csw on the server, run ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?
On Fri, Dec 09, 2011 at 12:11:50AM +1100, Lawrence Stewart wrote: On 12/08/11 05:08, Luigi Rizzo wrote: ... I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which seems slightly faster than HEAD) using MTU=1500 and various combinations of card capabilities (hwcsum,tso,lro), different window sizes and interrupt mitigation configurations. default latency is 16us, l=0 means no interrupt mitigation. lro is the software implementation of lro (tcp_lro.c) hwlro is the hardware one (on 82599). Using a window of 100 Kbytes seems to give the best results. Summary: [snip] - enabling software lro on the transmit side actually slows down the throughput (4-5Gbit/s instead of 8.0). I am not sure why (perhaps acks are delayed too much) ? Adding a couple of lines in tcp_lro to reject pure acks seems to have much better effect. The tcp_lro patch below might actually be useful also for other cards. --- tcp_lro.c (revision 228284) +++ tcp_lro.c (working copy) @@ -245,6 +250,8 @@ ip_len = ntohs(ip-ip_len); tcp_data_len = ip_len - (tcp-th_off 2) - sizeof (*ip); + if (tcp_data_len == 0) + return -1; /* not on ack */ /* There is a bug with our LRO implementation (first noticed by Jeff Roberson) that I started fixing some time back but dropped the ball on. The crux of the problem is that we currently only send an ACK for the entire LRO chunk instead of all the segments contained therein. Given that most stacks rely on the ACK clock to keep things ticking over, the current behaviour kills performance. It may well be the cause of the performance loss you have observed. I should clarify better. First of all, i tested two different LRO implementations: our Software LRO (tcp_lro.c), and the Hardware LRO which is implemented by the 82599 (called RSC or receive-side-coalescing in the 82599 data sheets). Jack Vogel and Navdeep Parhar (both in Cc) can probably comment on the logic of both. In my tests, either SW or HW LRO on the receive side HELPED A LOT, not just in terms of raw throughput but also in terms of system load on the receiver. On the receive side, LRO packs multiple data segments into one that is passed up the stack. As you mentioned this also reduces the number of acks generated, but not dramatically (consider, the LRO is bounded by the number of segments received in the mitigation interval). In my tests the number of reads() on the receiver was reduced by approx a factor of 3 compared to the !LRO case, meaning 4-5 segment merged by LRO. Navdeep reported some numbers for cxgbe with similar numbers. Using Hardware LRO on the transmit side had no ill effect. Being done in hardware i have no idea how it is implemented. Using Software LRO on the transmit side did give a significant throughput reduction. I can't explain the exact cause, though it is possible that between reducing the number of segments to the receiver and collapsing ACKs that it generates, the sender starves. But it could well be that it is the extra delay on passing up the ACKs that limits performance. Either way, since the HW LRO did a fine job, i was trying to figure out whether avoiding LRO on pure acks could help, and the two-line patch above did help. Note, my patch was just a proof-of-concept, and may cause reordering if a data segment is followed by a pure ack. But this can be fixed easily, handling a pure ack as an out-of-sequence packet in tcp_lro_rx(). WIP patch is at: http://people.freebsd.org/~lstewart/patches/misctcp/tcplro_multiack_9.x.r219723.patch Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have LRO capable hardware setup locally to figure out what I've missed. Most of the machines in my lab are running em(4) NICs which don't support LRO, but I'll see if I can find something which does and perhaps resurrect this patch. a few comments: 1. i don't think it makes sense to send multiple acks on coalesced segments (and the 82599 does not seem to do that). First of all, the acks would get out with minimal spacing (ideally less than 100ns) so chances are that the remote end will see them in a single burst anyways. Secondly, the remote end can easily tell that a single ACK is reporting multiple MSS and behave as if an equivalent number of acks had arrived. 2. i am a big fan of LRO (and similar solutions), because it can save a lot of repeated work when passing packets up the stack, and the mechanism becomes more and more effective as the system load increases, which is a wonderful property in terms of system stability. For this reason, i think it would be useful to add support for software LRO in the generic code (sys/net/if.c) so that drivers can directly use the software implementation even without hardware support. 3. similar to LRO, it would make sense to implement a software TSO mechanism where
Re: Stop scheduler on panic
On 12/4/11 5:11 PM, Andriy Gapon wrote: on 02/12/2011 17:30 m...@freebsd.org said the following: On Fri, Dec 2, 2011 at 2:05 AM, Andriy Gapona...@freebsd.org wrote: on 02/12/2011 06:36 John Baldwin said the following: Ah, ok (I had thought SCHEDULER_STOPPED was going to always be true when kdb was active). But I think these two changes should cover critical_exit() ok. I attempted to start a discussion about this a few times already :-) Should we treat kdb context the same as SCHEDULER_STOPPED context (in the current definition) ? That is, skip all locks in the same fashion? There are pros and contras. Does kdb pause all CPUs with an interrupt (NMI or regular interrupt, I can no longer remember...) when it enters? If so, then I'd say whether it enters via sysctl or panic doesn't matter. It's in a special environment where nothing else is running, which is what is needed for proper exploration of the machine (via breakpoint, for debugging a hang, etc). Maybe the question is, why wouldn't SCHEDULER_STOPPED be true regardless of how kdb is entered? I think that the discussion that followed has clarified this point a bit. SCHEDULER_STOPPED perhaps needs a better name :-) Currently it, the name, reflects the state of the scheduler, but not why the scheduler is stopped and not the greater state of the system (in panic), nor how we should handle that state (bypass locking). So I'd love something like BYPASS_LOCKING_BECAUSE _SCHEDULER_IS_STOPPED_IN_PANIC haven't it be so unwieldy :) Oh, hmm. Yes, being in the debugger should not potentially corrupt lock state, so in that sense it is a weaker stop. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Dog Food tm
On Thu, 2011-12-08 at 02:08 -0800, Peter Maloney wrote: And what problems did you run into? More or less, trying to do gmirror(4) style mirroring on GPT partitions doesn't work. See http://www.freebsd.org/doc/handbook/geom-mirror.html for the BIG RED WARNING that says why. This guide worked for me: http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror That, along with a lot of how to's, is out of date in the FreeBSD 9 world. I would suspect that my experience of attempting to setup a mirrored volume won't be unique. BSDInstaller and its predecessor Sysinstall don't have any code to create or destroy zfs(4) or geom(4) volumes. So, the amount of exposure to real users is approaching 0 in comparison to the number of people who really do use FreeBSD. I have my hands full with other projects at the moment, but I'm more than happy to grant access to a two disk SATA server if someone wants to enhance BSDInstall with zfs(4) or geom(4) volume management features. At a minimum, you *should* be able to take 2 disks and make a mirrored volume with either tool. Sean ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Dog Food tm
On Thu, Dec 8, 2011 at 3:55 PM, Sean Bruno sean...@yahoo-inc.com wrote: On Thu, 2011-12-08 at 02:08 -0800, Peter Maloney wrote: And what problems did you run into? More or less, trying to do gmirror(4) style mirroring on GPT partitions doesn't work. See http://www.freebsd.org/doc/handbook/geom-mirror.html for the BIG RED WARNING that says why. Er, gmirroring GPT _partitions_ works just fine. It is when you try to gmirror an entire disk that is partitioned with GPT that you have issues, as gmirror trashes the secondary GPT table at the end of the disk. You do not have that issue with individual GPT partitions. Cheers Tom ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Dog Food tm
Sean Bruno schreef: On Thu, 2011-12-08 at 02:08 -0800, Peter Maloney wrote: And what problems did you run into? More or less, trying to do gmirror(4) style mirroring on GPT partitions doesn't work. See http://www.freebsd.org/doc/handbook/geom-mirror.html for the BIG RED WARNING that says why. This guide worked for me: http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror That, along with a lot of how to's, is out of date in the FreeBSD 9 world. I would suspect that my experience of attempting to setup a mirrored volume won't be unique. BSDInstaller and its predecessor Sysinstall don't have any code to create or destroy zfs(4) or geom(4) volumes. So, the amount of exposure to real users is approaching 0 in comparison to the number of people who really do use FreeBSD. I have my hands full with other projects at the moment, but I'm more than happy to grant access to a two disk SATA server if someone wants to enhance BSDInstall with zfs(4) or geom(4) volume management features. At a minimum, you *should* be able to take 2 disks and make a mirrored volume with either tool. Sean also a good guide is here. http://unix-heaven.org/node/24 And gmirror GPT partitions is no problem, only with whole disks is where problems arise. regards Johan Hendriks ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Dog Food tm
On 08/12/2011 15:55, Sean Bruno wrote: BSDInstaller and its predecessor Sysinstall don't have any code to create or destroy zfs(4) or geom(4) volumes. So, the amount of exposure to real users is approaching 0 in comparison to the number of people who really do use FreeBSD. I have my hands full with other projects at the moment, but I'm more than happy to grant access to a two disk SATA server if someone wants to enhance BSDInstall with zfs(4) or geom(4) volume management features. I don't know if it supports RAID, but the geom-aware rewrite of sade(4) supports zfs: http://butcher.heavennet.ru/sade/ Unfortunately it was never completed because of the difficulties of using ncurses/dialog. -- Bruce Cran ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?
On 08.12.2011 14:11, Lawrence Stewart wrote: On 12/08/11 05:08, Luigi Rizzo wrote: On Wed, Dec 07, 2011 at 11:59:43AM +0100, Andre Oppermann wrote: On 06.12.2011 22:06, Luigi Rizzo wrote: ... Even in my experiments there is a lot of instability in the results. I don't know exactly where the problem is, but the high number of read syscalls, and the huge impact of setting interrupt_rate=0 (defaults at 16us on the ixgbe) makes me think that there is something that needs investigation in the protocol stack. Of course we don't want to optimize specifically for the one-flow-at-10G case, but devising something that makes the system less affected by short timing variations, and can pass upstream interrupt mitigation delays would help. I'm not sure the variance is only coming from the network card and driver side of things. The TCP processing and interactions with scheduler and locking probably play a big role as well. There have been many changes to TCP recently and maybe an inefficiency that affects high-speed single sessions throughput has crept in. That's difficult to debug though. I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which seems slightly faster than HEAD) using MTU=1500 and various combinations of card capabilities (hwcsum,tso,lro), different window sizes and interrupt mitigation configurations. default latency is 16us, l=0 means no interrupt mitigation. lro is the software implementation of lro (tcp_lro.c) hwlro is the hardware one (on 82599). Using a window of 100 Kbytes seems to give the best results. Summary: [snip] - enabling software lro on the transmit side actually slows down the throughput (4-5Gbit/s instead of 8.0). I am not sure why (perhaps acks are delayed too much) ? Adding a couple of lines in tcp_lro to reject pure acks seems to have much better effect. The tcp_lro patch below might actually be useful also for other cards. --- tcp_lro.c (revision 228284) +++ tcp_lro.c (working copy) @@ -245,6 +250,8 @@ ip_len = ntohs(ip-ip_len); tcp_data_len = ip_len - (tcp-th_off 2) - sizeof (*ip); + if (tcp_data_len == 0) + return -1; /* not on ack */ /* There is a bug with our LRO implementation (first noticed by Jeff Roberson) that I started fixing some time back but dropped the ball on. The crux of the problem is that we currently only send an ACK for the entire LRO chunk instead of all the segments contained therein. Given that most stacks rely on the ACK clock to keep things ticking over, the current behaviour kills performance. It may well be the cause of the performance loss you have observed. WIP patch is at: http://people.freebsd.org/~lstewart/patches/misctcp/tcplro_multiack_9.x.r219723.patch Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have LRO capable hardware setup locally to figure out what I've missed. Most of the machines in my lab are running em(4) NICs which don't support LRO, but I'll see if I can find something which does and perhaps resurrect this patch. If anyone has any ideas what I'm missing in the patch to make it work, please let me know. On low RTT's the accumulated ACKing probably doesn't make any difference. The congestion window will grow very fast anyway. On longer RTT's it sure will make a difference. Unless you have a 10Gig path with 50ms or so it's difficult to empirically test though. -- Andre ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?
On 08.12.2011 16:34, Luigi Rizzo wrote: On Fri, Dec 09, 2011 at 12:11:50AM +1100, Lawrence Stewart wrote: On 12/08/11 05:08, Luigi Rizzo wrote: ... I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which seems slightly faster than HEAD) using MTU=1500 and various combinations of card capabilities (hwcsum,tso,lro), different window sizes and interrupt mitigation configurations. default latency is 16us, l=0 means no interrupt mitigation. lro is the software implementation of lro (tcp_lro.c) hwlro is the hardware one (on 82599). Using a window of 100 Kbytes seems to give the best results. Summary: [snip] - enabling software lro on the transmit side actually slows down the throughput (4-5Gbit/s instead of 8.0). I am not sure why (perhaps acks are delayed too much) ? Adding a couple of lines in tcp_lro to reject pure acks seems to have much better effect. The tcp_lro patch below might actually be useful also for other cards. --- tcp_lro.c (revision 228284) +++ tcp_lro.c (working copy) @@ -245,6 +250,8 @@ ip_len = ntohs(ip-ip_len); tcp_data_len = ip_len - (tcp-th_off 2) - sizeof (*ip); + if (tcp_data_len == 0) + return -1; /* not on ack */ /* There is a bug with our LRO implementation (first noticed by Jeff Roberson) that I started fixing some time back but dropped the ball on. The crux of the problem is that we currently only send an ACK for the entire LRO chunk instead of all the segments contained therein. Given that most stacks rely on the ACK clock to keep things ticking over, the current behaviour kills performance. It may well be the cause of the performance loss you have observed. I should clarify better. First of all, i tested two different LRO implementations: our Software LRO (tcp_lro.c), and the Hardware LRO which is implemented by the 82599 (called RSC or receive-side-coalescing in the 82599 data sheets). Jack Vogel and Navdeep Parhar (both in Cc) can probably comment on the logic of both. In my tests, either SW or HW LRO on the receive side HELPED A LOT, not just in terms of raw throughput but also in terms of system load on the receiver. On the receive side, LRO packs multiple data segments into one that is passed up the stack. As you mentioned this also reduces the number of acks generated, but not dramatically (consider, the LRO is bounded by the number of segments received in the mitigation interval). In my tests the number of reads() on the receiver was reduced by approx a factor of 3 compared to the !LRO case, meaning 4-5 segment merged by LRO. Navdeep reported some numbers for cxgbe with similar numbers. Using Hardware LRO on the transmit side had no ill effect. Being done in hardware i have no idea how it is implemented. Using Software LRO on the transmit side did give a significant throughput reduction. I can't explain the exact cause, though it is possible that between reducing the number of segments to the receiver and collapsing ACKs that it generates, the sender starves. But it could well be that it is the extra delay on passing up the ACKs that limits performance. Either way, since the HW LRO did a fine job, i was trying to figure out whether avoiding LRO on pure acks could help, and the two-line patch above did help. Note, my patch was just a proof-of-concept, and may cause reordering if a data segment is followed by a pure ack. But this can be fixed easily, handling a pure ack as an out-of-sequence packet in tcp_lro_rx(). WIP patch is at: http://people.freebsd.org/~lstewart/patches/misctcp/tcplro_multiack_9.x.r219723.patch Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have LRO capable hardware setup locally to figure out what I've missed. Most of the machines in my lab are running em(4) NICs which don't support LRO, but I'll see if I can find something which does and perhaps resurrect this patch. LRO can always be done in software. You can do it at driver, ether_input or ip_input level. a few comments: 1. i don't think it makes sense to send multiple acks on coalesced segments (and the 82599 does not seem to do that). First of all, the acks would get out with minimal spacing (ideally less than 100ns) so chances are that the remote end will see them in a single burst anyways. Secondly, the remote end can easily tell that a single ACK is reporting multiple MSS and behave as if an equivalent number of acks had arrived. ABC (appropriate byte counting) gets in the way though. 2. i am a big fan of LRO (and similar solutions), because it can save a lot of repeated work when passing packets up the stack, and the mechanism becomes more and more effective as the system load increases, which is a wonderful property in terms of system stability. For this reason, i think it would be useful to add support for software LRO in the generic code (sys/net/if.c) so that drivers can
Adding bool, true, and false to the kernel
Hello -current! Recently on -arch@, we discussed adding the C99 keywords bool, true, and false to the kernel. I now have patches to do this as well as fix up some build issues. The original thread was here: http://lists.freebsd.org/pipermail/freebsd-arch/2011-November/011937.html I split the patches in three: http://people.freebsd.org/~mdf/0001-e1000-ixgbe-fix-code-to-not-define-bool-true-false-w.patch fixes up the e1000 and ixgbe code. Jack, can you please let me know if there are any issues. This should work to build whether or not sys/types.h has the new defines; I am testing make universe right now. http://people.freebsd.org/~mdf/0002-Fix-code-to-not-define-bool-true-false-when-already-.patch fixes the other code in the sys/ directory that gives build conflicts. Since I wasn't sure of the origin of the drivers, I conservatively left all their defines alone to allow the same driver code to build on both CURRENT and 9.0. If some of the drivers are wholly-owned by FreeBSD (unlike e1000 and ixgbe) and are not expected to be built on older releases, then the use of the old defines can be removed. If anyone knows the provenance of these files, please advise. http://people.freebsd.org/~mdf/0003-Define-bool-true-and-false-in-types.h-for-_KERNEL-co.patch actually defines bool, true, and false, and adds some extra paranoia in stdbool.h for anyone who has hacked their local build system and project repo to include stdbool.h in a kernel file. I also bumped __FreeBSD_version, though this is probably not necessary since __bool_true_false_are_defined is a better check than the __FreeBSD_version. This code should be MFC-able to stable/9 after 9.0 is released. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
maxio is not exported by mps(4)
Dear -hackers, can someone please enlighten me as to why maxio is not set by mps(4), and, what would be a reasonable number to set it to if i felt inclined to do so? default DFLTPHYS of 64K seems a bit low to me. thanks, max ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: zfs i/o hangs on 9-PRERELEASE
Hi all, Just wanted to report back that I found time to do more diagnostics. ZFS/FreeBSD/etc are not to blame. ZFS / FreeBSD never reported any I/O errors and scrubs always came up clean because one of the disks was failing and had issues reading the disk but would eventually return accurate data. It was sort of like having a RAIDZ with one disk that had 30s latency sometimes! Seatools confirmed this disk was failing and after replacing the disk all issues went away. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
FreeBSD 9.0-RC3 Available...
The third and what should be final Release Candidate build for the 9.0-RELEASE release cycle is now available. Since this is the beginning of a brand new branch (stable/9) I cross-post the announcements to both -current and -stable. But just so you know most of the developers active in head and stable/9 pay more attention to the -current mailing list. If you notice problems you can report them through the normal Gnats PR system or on the -current mailing list. This should be the last of the test builds. We hope to begin the final release builds in about a week. The 9.0-RELEASE cycle will be tracked here: http://wiki.freebsd.org/Releng/9.0TODO The location of the FTP install tree and ISOs is the same as it has been for BETA2/BETA3/RC1/RC2. The layout to a large degree is being dictated by the new build infrastructure and installer. But it's not particularly well suited to humans so I've added a shorter pathway to the ISOs. Unless there are lots of complaints about the layout we'll stick with this for the release. ISO images for amd64, i386, ia64, powerpc, powerpc64, and sparc64 are available here: ftp://ftp.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/9.0/ That directory is a set of symbolic links to the ISO images for all of the supported architectures, and checksum files (for example there is a symlink named CHECKSUM.MD5-amd64 that points to the CHECKSUM.MD5 file for the amd64 architecture). MD5/SHA256 checksums are tacked on below. If you would like to use csup/cvsup mechanisms to access the source tree the branch tag to use is now RELENG_9_0, if you use . (head) you will get 10-CURRENT. If you would like to access the source tree via SVN it is svn://svn.freebsd.org/base/releng/9.0/. We still have the nit that the creation of a new SVN branch winds up causing what looks like a check-in of the entire tree in CVS (a side-effect of the svn2cvs exporter) so mergemaster -F is your friend if you are using csup/cvsup. FreeBSD Update -- The freebsd-update(8) utility supports binary upgrades of i386 and amd64 systems running earlier FreeBSD releases. Systems running 7.[34]-RELEASE, 8.[12]-RELEASE, 9.0-BETA[123], or 9.0-RC[1,2] can upgrade as follows: First, a minor change must be made to the freebsd-update code in order for it to accept file names appearing in FreeBSD 9.0 which contain the '%' and '@' characters; without this change, freebsd-update will error out with the message The update metadata is correctly signed, but failed an integrity check. # sed -i '' -e 's/=_/=%@_/' /usr/sbin/freebsd-update Now freebsd-update can fetch bits belonging to 9.0-RC3. During this process freebsd-update will ask for help in merging configuration files. # freebsd-update upgrade -r 9.0-RC3 Due to changes in the way that FreeBSD is packaged on the release media, two complications may arise in this process if upgrading from FreeBSD 7.x or 8.x: 1. The FreeBSD kernel, which previously could appear in either /boot/kernel or /boot/GENERIC, now only appears as /boot/kernel. As a result, any kernel appearing in /boot/GENERIC will be deleted. Please carefully read the output printed by freebsd-update and confirm that an updated kernel will be placed into /boot/kernel before proceeding beyond this point. 2. The FreeBSD source tree in /usr/src (if present) will be deleted. (Normally freebsd-update will update a source tree, but in this case the changes in release packaging result in freebsd-update not recognizing that the source tree from the old release and the source tree from the new release correspond to the same part of FreeBSD.) # freebsd-update install The system must now be rebooted with the newly installed kernel before the non-kernel components are updated. # shutdown -r now After rebooting, freebsd-update needs to be run again to install the new userland components: # freebsd-update install At this point, users of systems being upgraded from FreeBSD 8.2-RELEASE or earlier will be prompted by freebsd-update to rebuild all third-party applications (e.g., ports installed from the ports tree) due to updates in system libraries. After updating installed third-party applications (and again, only if freebsd-update printed a message indicating that this was necessary), run freebsd-update again so that it can delete the old (no longer used) system libraries: # freebsd-update install Finally, reboot into 9.0-RC3: Checksums: MD5 (FreeBSD-9.0-RC3-amd64-bootonly.iso) = 53f2bc5a3d18124769bfb066e921559a MD5 (FreeBSD-9.0-RC3-amd64-disc1.iso) = b88eca54341523713712b184c6a7fc9a MD5 (FreeBSD-9.0-RC3-amd64-memstick.img) = a9b58348736d4a7a179941e818d33986 MD5 (FreeBSD-9.0-RC3-i386-bootonly.iso) = 86f0410ffb1c55fcb8faf33814e6e95b MD5 (FreeBSD-9.0-RC3-i386-disc1.iso) = 3585047256b1b8f72319aa55ffa3c3ad MD5 (FreeBSD-9.0-RC3-i386-memstick.img) = 8be95b49c498e666f87957a8c10997ce MD5 (FreeBSD-9.0-RC3-ia64-bootonly.iso) = 7a8e99a61d21ae8a5f6be9fb7f878b11 MD5 (FreeBSD-9.0-RC3-ia64-memstick) =
Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?
On Fri, Dec 09, 2011 at 01:33:04AM +0100, Andre Oppermann wrote: On 08.12.2011 16:34, Luigi Rizzo wrote: On Fri, Dec 09, 2011 at 12:11:50AM +1100, Lawrence Stewart wrote: ... Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have LRO capable hardware setup locally to figure out what I've missed. Most of the machines in my lab are running em(4) NICs which don't support LRO, but I'll see if I can find something which does and perhaps resurrect this patch. LRO can always be done in software. You can do it at driver, ether_input or ip_input level. storing LRO state at the driver (as it is done now) is very convenient, because it is trivial to flush the pending segments at the end of an rx interrupt. If you want to do LRO in ether_input() or ip_input(), you need to add another call to flush the LRO state stored there. a few comments: 1. i don't think it makes sense to send multiple acks on coalesced segments (and the 82599 does not seem to do that). First of all, the acks would get out with minimal spacing (ideally less than 100ns) so chances are that the remote end will see them in a single burst anyways. Secondly, the remote end can easily tell that a single ACK is reporting multiple MSS and behave as if an equivalent number of acks had arrived. ABC (appropriate byte counting) gets in the way though. right, during slow start the current ABC specification (RFC3465) sets a prettly low limit on how much the window can be expanded on each ACK. On the other hand... 2. i am a big fan of LRO (and similar solutions), because it can save a lot of repeated work when passing packets up the stack, and the mechanism becomes more and more effective as the system load increases, which is a wonderful property in terms of system stability. For this reason, i think it would be useful to add support for software LRO in the generic code (sys/net/if.c) so that drivers can directly use the software implementation even without hardware support. It hurts on higher RTT links in the general case. For LAN RTT's it's good. ... on the other hand remember that LRO coalescing is limited to the number of segments that arrive during a mitigation interval, so even on a 10G interface is't only a handful of packets. I better run some simulations to see how long it takes to get full rate on a 10..50ms path when using LRO. cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org