Re: zfs-related(?) panic in cache_enter: wrong vnode type

2011-12-08 Thread Pawel Jakub Dawidek
On Wed, Dec 07, 2011 at 06:50:35PM +0200, Andriy Gapon wrote:
 (kgdb) bt
 #0  doadump (textdump=1) at pcpu.h:224
 #1  0x804f6d3b in kern_reboot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:447
 #2  0x804f63e9 in panic (fmt=0x104 Address 0x104 out of bounds) at
 /usr/src/sys/kern/kern_shutdown.c:635
 #3  0x80585f46 in cache_enter (dvp=0xfe003d4763c0,
 vp=0xfe0142517000, cnp=0xff82393b3708) at 
 /usr/src/sys/kern/vfs_cache.c:726
 #4  0x81a90900 in zfs_lookup (dvp=0xfe003d4763c0,
 nm=0xff82393b3140 .., vpp=0xff82393b36e0, cnp=0xff82393b3708,
 nameiop=0, cr=0xfe0042e88100, td=0xfe000fdfa480,
 flags=0) at
 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1470

Which FreeBSD version is it? Because lines doesn't seem to match neither
to HEAD nor to stable/8. It would be best of you could send me few lines
around zfs_vnops.c:1470 and vfs_cache.c:726.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpI7RzOpvuwY.pgp
Description: PGP signature


Re: Dog Food tm

2011-12-08 Thread Daniel Gerzo

On Tue, 06 Dec 2011 13:51:05 -0800, Sean Bruno wrote:

Was trying to use gmirror(4) or zfs(4) today to get a machine in the
cluster setup with s/w raid and was completely flummoxed by the
intricacies of manual setup.  Chances are, I just am not smart enough 
to

wind my way though the various how tos and wiki pages that I've been
browsing to get the job done.


Why using gmirror under zfs, when zfs itself supports software raid?


If someone wants to work on modifying bsdinstaller to do s/w raid via
one of these mechanisms, clusteradm@ can provide you a two disk SATA
machine that can be used for this purpose.

Sean


--
Kind regards
  Daniel
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: zfs-related(?) panic in cache_enter: wrong vnode type

2011-12-08 Thread Andriy Gapon
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

on 08/12/2011 10:13 Pawel Jakub Dawidek said the following:
 On Wed, Dec 07, 2011 at 06:50:35PM +0200, Andriy Gapon wrote:
 (kgdb) bt #0  doadump (textdump=1) at pcpu.h:224 #1  0x804f6d3b
 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:447 #2
 0x804f63e9 in panic (fmt=0x104 Address 0x104 out of bounds) at 
 /usr/src/sys/kern/kern_shutdown.c:635 #3  0x80585f46 in
 cache_enter (dvp=0xfe003d4763c0, vp=0xfe0142517000,
 cnp=0xff82393b3708) at /usr/src/sys/kern/vfs_cache.c:726 #4
 0x81a90900 in zfs_lookup (dvp=0xfe003d4763c0, 
 nm=0xff82393b3140 .., vpp=0xff82393b36e0,
 cnp=0xff82393b3708, nameiop=0, cr=0xfe0042e88100,
 td=0xfe000fdfa480, flags=0) at 
 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1470

 
 Which FreeBSD version is it? Because lines doesn't seem to match neither to
 HEAD nor to stable/8. It would be best of you could send me few lines 
 around zfs_vnops.c:1470 and vfs_cache.c:726.
 

It's recent svn head, r228017, just with some local unrelated modifications.
1458 #ifdef FREEBSD_NAMECACHE
1459 /*
1460  * Insert name into cache (as non-existent) if appropriate.
1461  */
1462 if (error == ENOENT  (cnp-cn_flags  MAKEENTRY)  nameiop !=
CREATE)
1463 cache_enter(dvp, *vpp, cnp);
1464 /*
1465  * Insert name into cache if appropriate.
1466  */
1467 if (error == 0  (cnp-cn_flags  MAKEENTRY)) {
1468 if (!(cnp-cn_flags  ISLASTCN) ||
1469 (nameiop != DELETE  nameiop != RENAME)) {
1470 cache_enter(dvp, *vpp, cnp);
1471 }
1472 }
1473 #endif



 716 if (flag == NCF_ISDOTDOT) {
 717 /*
 718  * See if we are trying to add .. entry, but some other
lookup
 719  * has populated v_cache_dd pointer already.
 720  */
 721 if (dvp-v_cache_dd != NULL) {
 722 CACHE_WUNLOCK();
 723 cache_free(ncp);
 724 return;
 725 }
 726 KASSERT(vp == NULL || vp-v_type == VDIR,
 727 (wrong vnode type %p, vp));
 728 dvp-v_cache_dd = ncp;
 729 }
- -- 
Andriy Gapon
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO4IIgAAoJEHSlLSemUf4v5v0H/RqN3RagBBleHYZTY+39gxj3
I2urnYxB+1vi2b9o/B4KRQXi5gByAY3sukDnKrABxXK3ycOOJjQ5o8Xz0NMqcwHY
r6oMnh/4NLpZi+Cwx0LQKycRZuPMsKzYJpMrofE/Q9Nl1EgUjz6Er2fOaEiu1xUO
DAWKrJdgFNE3Kwjy64oqftcC9Aw9g0+lcyVp+Pzw9HRHzw1h8wokH+EvslfFqtJ0
YFCabxTxNQrqpCgv8PEEAAyNiqB7S9X43f5RKNuMFOiwGp8GsP7O6j9AFDUoM9os
+KkkN0hE28jB4daQ0HJpPr9Kz3Mkr91yMT+nxMv1lPH9BmGYWwbx3Jck0c7o+OY=
=8Y/W
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Daniel Kalchev



On 07.12.11 22:23, Luigi Rizzo wrote:


Sorry, forgot to mention that the above is with TSO DISABLED
(which is not the default). TSO seems to have a very bad
interaction with HWCSUM and non-zero mitigation.


I have this on both sender and receiver

# ifconfig ix1
ix1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500

options=4bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO

ether 00:25:90:35:22:f1
inet 10.2.101.11 netmask 0xff00 broadcast 10.2.101.255
media: Ethernet autoselect (autoselect full-duplex)
status: active

without LRO on either end

# nuttcp -t -T 5 -w 128 -v 10.2.101.11
nuttcp-t: v6.1.2: socket
nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11
nuttcp-t: time limit = 5.00 seconds
nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.051 ms
nuttcp-t: send window size = 131768, receive window size = 66608
nuttcp-t: 1802.4049 MB in 5.06 real seconds = 365077.76 KB/sec = 
2990.7170 Mbps

nuttcp-t: host-retrans = 0
nuttcp-t: 28839 I/O calls, msec/call = 0.18, calls/sec = 5704.44
nuttcp-t: 0.0user 4.5sys 0:05real 90% 108i+1459d 630maxrss 0+2pf 87706+1csw

nuttcp-r: v6.1.2: socket
nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
nuttcp-r: accept from 10.2.101.12
nuttcp-r: send window size = 33304, receive window size = 131768
nuttcp-r: 1802.4049 MB in 5.18 real seconds = 356247.49 KB/sec = 
2918.3794 Mbps

nuttcp-r: 529295 I/O calls, msec/call = 0.01, calls/sec = 102163.86
nuttcp-r: 0.1user 3.7sys 0:05real 73% 116i+1567d 618maxrss 0+15pf 
230404+0csw


with LRO on receiver

# nuttcp -t -T 5 -w 128 -v 10.2.101.11
nuttcp-t: v6.1.2: socket
nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11
nuttcp-t: time limit = 5.00 seconds
nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.067 ms
nuttcp-t: send window size = 131768, receive window size = 66608
nuttcp-t: 2420.5000 MB in 5.02 real seconds = 493701.04 KB/sec = 
4044.3989 Mbps

nuttcp-t: host-retrans = 2
nuttcp-t: 38728 I/O calls, msec/call = 0.13, calls/sec = 7714.08
nuttcp-t: 0.0user 4.1sys 0:05real 83% 107i+1436d 630maxrss 0+2pf 4896+0csw

nuttcp-r: v6.1.2: socket
nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
nuttcp-r: accept from 10.2.101.12
nuttcp-r: send window size = 33304, receive window size = 131768
nuttcp-r: 2420.5000 MB in 5.15 real seconds = 481679.37 KB/sec = 
3945.9174 Mbps

nuttcp-r: 242266 I/O calls, msec/call = 0.02, calls/sec = 47080.98
nuttcp-r: 0.0user 2.4sys 0:05real 49% 112i+1502d 618maxrss 0+15pf 
156333+0csw


About 1/4 improvement...

With LRO on both sender and receiver

# nuttcp -t -T 5 -w 128 -v 10.2.101.11
nuttcp-t: v6.1.2: socket
nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11
nuttcp-t: time limit = 5.00 seconds
nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.049 ms
nuttcp-t: send window size = 131768, receive window size = 66608
nuttcp-t: 2585.7500 MB in 5.02 real seconds = 527402.83 KB/sec = 
4320.4840 Mbps

nuttcp-t: host-retrans = 1
nuttcp-t: 41372 I/O calls, msec/call = 0.12, calls/sec = 8240.67
nuttcp-t: 0.0user 4.6sys 0:05real 93% 106i+1421d 630maxrss 0+2pf 4286+0csw

nuttcp-r: v6.1.2: socket
nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
nuttcp-r: accept from 10.2.101.12
nuttcp-r: send window size = 33304, receive window size = 131768
nuttcp-r: 2585.7500 MB in 5.15 real seconds = 514585.31 KB/sec = 
4215.4829 Mbps

nuttcp-r: 282820 I/O calls, msec/call = 0.02, calls/sec = 54964.34
nuttcp-r: 0.0user 2.7sys 0:05real 55% 114i+1540d 618maxrss 0+15pf 
188794+147csw


Even better...

With LRO on sender only:

# nuttcp -t -T 5 -w 128 -v 10.2.101.11
nuttcp-t: v6.1.2: socket
nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11
nuttcp-t: time limit = 5.00 seconds
nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.054 ms
nuttcp-t: send window size = 131768, receive window size = 66608
nuttcp-t: 2077.5437 MB in 5.02 real seconds = 423740.81 KB/sec = 
3471.2847 Mbps

nuttcp-t: host-retrans = 0
nuttcp-t: 33241 I/O calls, msec/call = 0.15, calls/sec = 6621.01
nuttcp-t: 0.0user 4.5sys 0:05real 92% 109i+1468d 630maxrss 0+2pf 49532+25csw

nuttcp-r: v6.1.2: socket
nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
nuttcp-r: accept from 10.2.101.12
nuttcp-r: send window size = 33304, receive window size = 131768
nuttcp-r: 2077.5437 MB in 5.15 real seconds = 413415.33 KB/sec = 
3386.6984 Mbps

nuttcp-r: 531979 I/O calls, msec/call = 0.01, calls/sec = 103378.67
nuttcp-r: 0.0user 4.5sys 0:05real 88% 110i+1474d 618maxrss 0+15pf 
117367+0csw




also remember that hw.ixgbe.max_interrupt_rate has only
effect at module load -- i.e. you set it with the bootloader,
or with kenv before loading the module.


I have this in /boot/loader.conf

kern.ipc.nmbclusters=512000
hw.ixgbe.max_interrupt_rate=0

on both sender and receiver.


Please retry the measurements disabling tso (on both sides, but
it really matters only on the sender). Also, LRO requires HWCSUM.


How do I set HWCSUM? Is this different 

Re: Dog Food tm

2011-12-08 Thread Peter Maloney
On 12/08/2011 09:35 AM, Daniel Gerzo wrote:
 On Tue, 06 Dec 2011 13:51:05 -0800, Sean Bruno wrote:
 Was trying to use gmirror(4) or zfs(4) today to get a machine in the
 cluster setup with s/w raid and was completely flummoxed by the
 intricacies of manual setup.  Chances are, I just am not smart enough to
 wind my way though the various how tos and wiki pages that I've been
 browsing to get the job done.

 Why using gmirror under zfs, when zfs itself supports software raid?

 If someone wants to work on modifying bsdinstaller to do s/w raid via
 one of these mechanisms, clusteradm@ can provide you a two disk SATA
 machine that can be used for this purpose.

 Sean

And what problems did you run into?

This guide worked for me:
http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror

(but the zfs create ... part was too much typing, so I did it with a
script that I added to the CD)

 


Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
Internet: http://www.brockmann-consult.de


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Luigi Rizzo
On Thu, Dec 08, 2011 at 12:06:26PM +0200, Daniel Kalchev wrote:
 
 
 On 07.12.11 22:23, Luigi Rizzo wrote:
 
 Sorry, forgot to mention that the above is with TSO DISABLED
 (which is not the default). TSO seems to have a very bad
 interaction with HWCSUM and non-zero mitigation.
 
 I have this on both sender and receiver
 
 # ifconfig ix1
 ix1: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
 
 options=4bbRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LRO
 ether 00:25:90:35:22:f1
 inet 10.2.101.11 netmask 0xff00 broadcast 10.2.101.255
 media: Ethernet autoselect (autoselect full-duplex)
 status: active
 
 without LRO on either end
 
 # nuttcp -t -T 5 -w 128 -v 10.2.101.11
 nuttcp-t: v6.1.2: socket
 nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11
 nuttcp-t: time limit = 5.00 seconds
 nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.051 ms
 nuttcp-t: send window size = 131768, receive window size = 66608
 nuttcp-t: 1802.4049 MB in 5.06 real seconds = 365077.76 KB/sec = 
 2990.7170 Mbps
 nuttcp-t: host-retrans = 0
 nuttcp-t: 28839 I/O calls, msec/call = 0.18, calls/sec = 5704.44
 nuttcp-t: 0.0user 4.5sys 0:05real 90% 108i+1459d 630maxrss 0+2pf 87706+1csw
 
 nuttcp-r: v6.1.2: socket
 nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
 nuttcp-r: accept from 10.2.101.12
 nuttcp-r: send window size = 33304, receive window size = 131768
 nuttcp-r: 1802.4049 MB in 5.18 real seconds = 356247.49 KB/sec = 
 2918.3794 Mbps
 nuttcp-r: 529295 I/O calls, msec/call = 0.01, calls/sec = 102163.86
 nuttcp-r: 0.1user 3.7sys 0:05real 73% 116i+1567d 618maxrss 0+15pf 
 230404+0csw
 
 with LRO on receiver
 
 # nuttcp -t -T 5 -w 128 -v 10.2.101.11
 nuttcp-t: v6.1.2: socket
 nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11
 nuttcp-t: time limit = 5.00 seconds
 nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.067 ms
 nuttcp-t: send window size = 131768, receive window size = 66608
 nuttcp-t: 2420.5000 MB in 5.02 real seconds = 493701.04 KB/sec = 
 4044.3989 Mbps
 nuttcp-t: host-retrans = 2
 nuttcp-t: 38728 I/O calls, msec/call = 0.13, calls/sec = 7714.08
 nuttcp-t: 0.0user 4.1sys 0:05real 83% 107i+1436d 630maxrss 0+2pf 4896+0csw
 
 nuttcp-r: v6.1.2: socket
 nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
 nuttcp-r: accept from 10.2.101.12
 nuttcp-r: send window size = 33304, receive window size = 131768
 nuttcp-r: 2420.5000 MB in 5.15 real seconds = 481679.37 KB/sec = 
 3945.9174 Mbps
 nuttcp-r: 242266 I/O calls, msec/call = 0.02, calls/sec = 47080.98
 nuttcp-r: 0.0user 2.4sys 0:05real 49% 112i+1502d 618maxrss 0+15pf 
 156333+0csw
 
 About 1/4 improvement...
 
 With LRO on both sender and receiver
 
 # nuttcp -t -T 5 -w 128 -v 10.2.101.11
 nuttcp-t: v6.1.2: socket
 nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11
 nuttcp-t: time limit = 5.00 seconds
 nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.049 ms
 nuttcp-t: send window size = 131768, receive window size = 66608
 nuttcp-t: 2585.7500 MB in 5.02 real seconds = 527402.83 KB/sec = 
 4320.4840 Mbps
 nuttcp-t: host-retrans = 1
 nuttcp-t: 41372 I/O calls, msec/call = 0.12, calls/sec = 8240.67
 nuttcp-t: 0.0user 4.6sys 0:05real 93% 106i+1421d 630maxrss 0+2pf 4286+0csw
 
 nuttcp-r: v6.1.2: socket
 nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
 nuttcp-r: accept from 10.2.101.12
 nuttcp-r: send window size = 33304, receive window size = 131768
 nuttcp-r: 2585.7500 MB in 5.15 real seconds = 514585.31 KB/sec = 
 4215.4829 Mbps
 nuttcp-r: 282820 I/O calls, msec/call = 0.02, calls/sec = 54964.34
 nuttcp-r: 0.0user 2.7sys 0:05real 55% 114i+1540d 618maxrss 0+15pf 
 188794+147csw
 
 Even better...
 
 With LRO on sender only:
 
 # nuttcp -t -T 5 -w 128 -v 10.2.101.11
 nuttcp-t: v6.1.2: socket
 nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.2.101.11
 nuttcp-t: time limit = 5.00 seconds
 nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.054 ms
 nuttcp-t: send window size = 131768, receive window size = 66608
 nuttcp-t: 2077.5437 MB in 5.02 real seconds = 423740.81 KB/sec = 
 3471.2847 Mbps
 nuttcp-t: host-retrans = 0
 nuttcp-t: 33241 I/O calls, msec/call = 0.15, calls/sec = 6621.01
 nuttcp-t: 0.0user 4.5sys 0:05real 92% 109i+1468d 630maxrss 0+2pf 49532+25csw
 
 nuttcp-r: v6.1.2: socket
 nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
 nuttcp-r: accept from 10.2.101.12
 nuttcp-r: send window size = 33304, receive window size = 131768
 nuttcp-r: 2077.5437 MB in 5.15 real seconds = 413415.33 KB/sec = 
 3386.6984 Mbps
 nuttcp-r: 531979 I/O calls, msec/call = 0.01, calls/sec = 103378.67
 nuttcp-r: 0.0user 4.5sys 0:05real 88% 110i+1474d 618maxrss 0+15pf 
 117367+0csw
 
 
 also remember that hw.ixgbe.max_interrupt_rate has only
 effect at module load -- i.e. you set it with the bootloader,
 or with kenv before loading the module.
 
 I have this in /boot/loader.conf
 
 kern.ipc.nmbclusters=512000
 hw.ixgbe.max_interrupt_rate=0
 
 on both sender and receiver.
 

Re: Burning CDs and DVDs on SATA drive in FreeBSD 9.0

2011-12-08 Thread Thomas Mueller
from my last message:

I can't get cdrtools (cdrecord or readcd) to work on FreeBSD 9.0-RC2, and now I 
see RC3 is available.

I tried options ATA_CAM in kernel config, removing device atapicam, but 
still 
readcd -scanbus  or
cdrecord -scanbus
refuses to work, running as root.

camcontrol devlist shows my DVD drive, and I am able to mount and read 
/dev/cd0.

Is cdrtools buggy, or maybe the FreeBSD port is buggy?

Daniel O'Connor docon...@gsoft.com.au responded:

 Define refuses to work..
 
 I have an oldish 9.0-CURRENT which works with cdrecord..
 [titus 1:16] ~ cdrecord -scanbus
 Cdrecord-ProDVD-ProBD-Clone 3.00 (amd64-unknown-freebsd9.0) Copyright (C) 
 1995-2010 J?rg Schilling
 Using libscg version 'schily-0.9'.
 scsibus1:
 1,0,0   100) 'PIONEER ' 'DVD-RW  DVR-112D' '1.09' Removable CD-ROM
 1,1,0   101) *
 1,2,0   102) *
 1,3,0   103) *
 1,4,0   104) *
 1,5,0   105) *
 1,6,0   106) *
 1,7,0   107) *

cdrecord -scanbus
produces


cdrecord: Inappropriate ioctl for device. CAMIOCOMMAND ioctl failed. Cannot 
open or use SCSI driver.
cdrecord: For possible targets try 'cdrecord -scanbus'.
cdrecord: For possible transport specifiers try 'cdrecord dev=help'.
Cdrecord-ProDVD-ProBD-Clone 3.00 (amd64-unknown-freebsd9.0) Copyright (C) 
1995-2010 Jörg Schilling

About the same with readcd, and cdrecord dev=help is no help.

from grarpamp grarp...@gmail.com:

 In the past, I've used the ftp cdrtools pkg (made from the
 port of course) and it failed to work. It's a popular tool so my
 machine was probably out of sync. Same with burncd. However,
 compiling the current cdrtools source worked fine. So I'd try
 that first, compare, and send up a bug if need be.
 
 Try to skip the scan by specifying the BTL or devpath on the
 command line. The scan is a big part of the port and might
 have breakage, at least for the app below.
 
 Also, if you're doing audio, someone over on ports has said
 they're doing an update to cdparanoia. It's minor, but useful
 for that crowd.
 
 makefs and burncd are part of the base, at least on RELENG_8.
 And makefs is used in the official releases. So they should just work.
 
 Good luck.

You mean build cdrtools, or possibly cdrkit, directly from the source outside 
the FreeBSD ports collection.

I could then see if I could make that into my own port, or else install to 
prefix /usr/local2.

If there is a conflict between cdrkit and cdrtools, maybe install the other to 
prefix /usr/local3?

Then I could spawn a subshell with /usr/local2/bin or /usr/local3/bin added to 
the PATH.

If I get something to work, I could report back to this ports emailing list, 
share the benefits with others.
 
Tom

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Lawrence Stewart

On 12/08/11 05:08, Luigi Rizzo wrote:

On Wed, Dec 07, 2011 at 11:59:43AM +0100, Andre Oppermann wrote:

On 06.12.2011 22:06, Luigi Rizzo wrote:

...

Even in my experiments there is a lot of instability in the results.
I don't know exactly where the problem is, but the high number of
read syscalls, and the huge impact of setting interrupt_rate=0
(defaults at 16us on the ixgbe) makes me think that there is something
that needs investigation in the protocol stack.

Of course we don't want to optimize specifically for the one-flow-at-10G
case, but devising something that makes the system less affected
by short timing variations, and can pass upstream interrupt mitigation
delays would help.


I'm not sure the variance is only coming from the network card and
driver side of things.  The TCP processing and interactions with
scheduler and locking probably play a big role as well.  There have
been many changes to TCP recently and maybe an inefficiency that
affects high-speed single sessions throughput has crept in.  That's
difficult to debug though.


I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which
seems slightly faster than HEAD) using MTU=1500 and various
combinations of card capabilities (hwcsum,tso,lro), different window
sizes and interrupt mitigation configurations.

default latency is 16us, l=0 means no interrupt mitigation.
lro is the software implementation of lro (tcp_lro.c)
hwlro is the hardware one (on 82599). Using a window of 100 Kbytes
seems to give the best results.

Summary:


[snip]


- enabling software lro on the transmit side actually slows
   down the throughput (4-5Gbit/s instead of 8.0).
   I am not sure why (perhaps acks are delayed too much) ?
   Adding a couple of lines in tcp_lro to reject
   pure acks seems to have much better effect.

The tcp_lro patch below might actually be useful also for
other cards.

--- tcp_lro.c   (revision 228284)
+++ tcp_lro.c   (working copy)
@@ -245,6 +250,8 @@

 ip_len = ntohs(ip-ip_len);
 tcp_data_len = ip_len - (tcp-th_off  2) - sizeof (*ip);
+   if (tcp_data_len == 0)
+   return -1;  /* not on ack */


 /*


There is a bug with our LRO implementation (first noticed by Jeff 
Roberson) that I started fixing some time back but dropped the ball on. 
The crux of the problem is that we currently only send an ACK for the 
entire LRO chunk instead of all the segments contained therein. Given 
that most stacks rely on the ACK clock to keep things ticking over, the 
current behaviour kills performance. It may well be the cause of the 
performance loss you have observed. WIP patch is at:


http://people.freebsd.org/~lstewart/patches/misctcp/tcplro_multiack_9.x.r219723.patch

Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have 
LRO capable hardware setup locally to figure out what I've missed. Most 
of the machines in my lab are running em(4) NICs which don't support 
LRO, but I'll see if I can find something which does and perhaps 
resurrect this patch.


If anyone has any ideas what I'm missing in the patch to make it work, 
please let me know.


Cheers,
Lawrence
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [RFC] VIA south bridge watchdog support

2011-12-08 Thread Fabien Thomas


   Hi,
 
 If someone want to look at I've added support for VIA south bridge watchdog.
 It has been tested on VX900 but should works with VX800, VX855, CX700.
 
 http://people.freebsd.org/~fabient/patch-watchdog-via-rev1
 

Posted rev2 : http://people.freebsd.org/~fabient/patch-watchdog-via-rev2
- add / test VT8251
- man page update
- reset FIRED bit when detected

Maybe other south bridge supported but feedback is needed.

Fabien___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Burning CDs and DVDs on SATA drive in FreeBSD 9.0

2011-12-08 Thread Michiel Boland

On 12/08/2011 01:25, Thomas Mueller wrote:

I can't get cdrtools (cdrecord or readcd) to work on FreeBSD 9.0-RC2, and now I 
see RC3 is available.

I tried options ATA_CAM in kernel config, removing device atapicam, but 
still
readcd -scanbus  or
cdrecord -scanbus
refuses to work, running as root.

camcontrol devlist shows my DVD drive, and I am able to mount and read 
/dev/cd0.

Is cdrtools buggy, or maybe the FreeBSD port is buggy?

I see from the Makefile that sysutils/cdrdao is broken on FreeBSD 9.x, does not 
link.

I downloaded cdrkit-1.1.11.tar.gz so as to extract and have a look at it, and 
possibly build it on my own, outside the ports system.

I see the version in ports system is 1.1.9, maybe 1.1.11 could do better?

Maybe the FreeBSD developers were too hasty to deprecate burncd?


Recompile the port; the CAM ioctl numbers have changed.

Cheers
Michiel
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Slawa Olhovchenkov
On Mon, Dec 05, 2011 at 08:27:03PM +0100, Luigi Rizzo wrote:

 Hi,
 I am trying to establish the baseline performance for 10G throughput
 over TCP, and would like to collect some data points.  As a testing
 program i am using nuttcp from ports (as good as anything, i
 guess -- it is reasonably flexible, and if you use it in
 TCP with relatively large writes, the overhead for syscalls
 and gettimeofday() shouldn't kill you).
 
 I'd be very grateful if you could do the following test:
 
 - have two machines connected by a 10G link
 - on one run nuttcp -S
 - on the other one run nuttcp -t -T 5 -w 128 -v the.other.ip
 
 and send me a dump of the output, such as the one(s) at the end of
 the message.
 
 I am mostly interested in two configurations:
 - one over loopback, which should tell how fast is the CPU+memory
   As an example, one of my machines does about 15 Gbit/s, and
   one of the faster ones does about 44 Gbit/s
 
 - one over the wire using 1500 byte mss. Here it really matters
   how good is the handling of small MTUs.
 
 As a data point, on my machines i get 2..3.5 Gbit/s on the
 slow machine with a 1500 byte mtu and default card setting.
 Clearing the interrupt mitigation register (so no mitigation)
 brings the rate to 5-6 Gbit/s. Same hardware with linux does
 about 8 Gbit/s. HEAD seems 10-20% slower than RELENG_8 though i
 am not sure who is at fault.
 
 The receive side is particularly critical - on FreeBSD
 the receiver is woken up every two packets (do the math
 below, between the number of rx calls and throughput and mss),
 resulting in almost 200K activations per second, and despite
 the fact that interrupt mitigation is set to a much lower
 value (so incoming packets should be batched).
 On linux, i see much fewer reads, presumably the process is
 woken up only at the end of a burst.

About relative performance FreeBSD and Linux I wrote in -performance@
at Jan'11 (Interrupt performance)

 
  EXAMPLES OF OUTPUT --
 
  nuttcp -t -T 5 -w 128 -v  10.0.1.2
 nuttcp-t: v6.1.2: socket
 nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - 10.0.1.2
 nuttcp-t: time limit = 5.00 seconds
 nuttcp-t: connect to 10.0.1.2 with mss=1460, RTT=0.103 ms
 nuttcp-t: send window size = 131400, receive window size = 65700
 nuttcp-t: 3095.0982 MB in 5.00 real seconds = 633785.85 KB/sec = 5191.9737 
 Mbps
 nuttcp-t: host-retrans = 0
 nuttcp-t: 49522 I/O calls, msec/call = 0.10, calls/sec = 9902.99
 nuttcp-t: 0.0user 2.7sys 0:05real 54% 100i+2639d 752maxrss 0+3pf 258876+6csw
 
 nuttcp-r: v6.1.2: socket
 nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
 nuttcp-r: accept from 10.0.1.4
 nuttcp-r: send window size = 33580, receive window size = 131400
 nuttcp-r: 3095.0982 MB in 5.17 real seconds = 613526.42 KB/sec = 5026.0084 
 Mbps
 nuttcp-r: 1114794 I/O calls, msec/call = 0.00, calls/sec = 215801.03
 nuttcp-r: 0.1user 3.5sys 0:05real 69% 112i+1104d 626maxrss 0+15pf 
 507653+188csw
 
 
  nuttcp -t -T 5 -w 128 -v localhost
 nuttcp-t: v6.1.2: socket
 nuttcp-t: buflen=65536, nstream=1, port=5001 tcp - localhost
 nuttcp-t: time limit = 5.00 seconds
 nuttcp-t: connect to 127.0.0.1 with mss=14336, RTT=0.051 ms
 nuttcp-t: send window size = 143360, receive window size = 71680
 nuttcp-t: 26963.4375 MB in 5.00 real seconds = 5521440.59 KB/sec = 45231.6413 
 Mbps
 nuttcp-t: host-retrans = 0
 nuttcp-t: 431415 I/O calls, msec/call = 0.01, calls/sec = 86272.51
 nuttcp-t: 0.0user 4.6sys 0:05real 93% 102i+2681d 774maxrss 0+3pf 2510+1csw
 
 nuttcp-r: v6.1.2: socket
 nuttcp-r: buflen=65536, nstream=1, port=5001 tcp
 nuttcp-r: accept from 127.0.0.1
 nuttcp-r: send window size = 43008, receive window size = 143360
 nuttcp-r: 26963.4375 MB in 5.20 real seconds = 5313135.74 KB/sec = 43525.2080 
 Mbps
 nuttcp-r: 767807 I/O calls, msec/call = 0.01, calls/sec = 147750.09
 nuttcp-r: 0.1user 3.9sys 0:05real 79% 98i+2570d 772maxrss 0+16pf 311014+8csw
 
 
 on the server, run  
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Luigi Rizzo
On Fri, Dec 09, 2011 at 12:11:50AM +1100, Lawrence Stewart wrote:
 On 12/08/11 05:08, Luigi Rizzo wrote:
...
 I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which
 seems slightly faster than HEAD) using MTU=1500 and various
 combinations of card capabilities (hwcsum,tso,lro), different window
 sizes and interrupt mitigation configurations.
 
 default latency is 16us, l=0 means no interrupt mitigation.
 lro is the software implementation of lro (tcp_lro.c)
 hwlro is the hardware one (on 82599). Using a window of 100 Kbytes
 seems to give the best results.
 
 Summary:
 
 [snip]
 
 - enabling software lro on the transmit side actually slows
down the throughput (4-5Gbit/s instead of 8.0).
I am not sure why (perhaps acks are delayed too much) ?
Adding a couple of lines in tcp_lro to reject
pure acks seems to have much better effect.
 
 The tcp_lro patch below might actually be useful also for
 other cards.
 
 --- tcp_lro.c   (revision 228284)
 +++ tcp_lro.c   (working copy)
 @@ -245,6 +250,8 @@
 
  ip_len = ntohs(ip-ip_len);
  tcp_data_len = ip_len - (tcp-th_off  2) - sizeof (*ip);
 +   if (tcp_data_len == 0)
 +   return -1;  /* not on ack */
 
 
  /*
 
 There is a bug with our LRO implementation (first noticed by Jeff 
 Roberson) that I started fixing some time back but dropped the ball on. 
 The crux of the problem is that we currently only send an ACK for the 
 entire LRO chunk instead of all the segments contained therein. Given 
 that most stacks rely on the ACK clock to keep things ticking over, the 
 current behaviour kills performance. It may well be the cause of the 
 performance loss you have observed.

I should clarify better.
First of all, i tested two different LRO implementations: our
Software LRO (tcp_lro.c), and the Hardware LRO which is implemented
by the 82599 (called RSC or receive-side-coalescing in the 82599
data sheets). Jack Vogel and Navdeep Parhar (both in Cc) can
probably comment on the logic of both.

In my tests, either SW or HW LRO on the receive side HELPED A LOT,
not just in terms of raw throughput but also in terms of system
load on the receiver. On the receive side, LRO packs multiple data
segments into one that is passed up the stack.

As you mentioned this also reduces the number of acks generated,
but not dramatically (consider, the LRO is bounded by the number
of segments received in the mitigation interval).
In my tests the number of reads() on the receiver was reduced by
approx a factor of 3 compared to the !LRO case, meaning 4-5 segment
merged by LRO. Navdeep reported some numbers for cxgbe with similar
numbers.

Using Hardware LRO on the transmit side had no ill effect.
Being done in hardware i have no idea how it is implemented.

Using Software LRO on the transmit side did give a significant
throughput reduction. I can't explain the exact cause, though it
is possible that between reducing the number of segments to the
receiver and collapsing ACKs that it generates, the sender starves.
But it could well be that it is the extra delay on passing up the ACKs
that limits performance.
Either way, since the HW LRO did a fine job, i was trying to figure
out whether avoiding LRO on pure acks could help, and the two-line
patch above did help.

Note, my patch was just a proof-of-concept, and may cause
reordering if a data segment is followed by a pure ack.
But this can be fixed easily, handling a pure ack as
an out-of-sequence packet in tcp_lro_rx().

 WIP patch is at:
 http://people.freebsd.org/~lstewart/patches/misctcp/tcplro_multiack_9.x.r219723.patch
 
 Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have 
 LRO capable hardware setup locally to figure out what I've missed. Most 
 of the machines in my lab are running em(4) NICs which don't support 
 LRO, but I'll see if I can find something which does and perhaps 
 resurrect this patch.

a few comments:
1. i don't think it makes sense to send multiple acks on
   coalesced segments (and the 82599 does not seem to do that).
   First of all, the acks would get out with minimal spacing (ideally
   less than 100ns) so chances are that the remote end will see
   them in a single burst anyways. Secondly, the remote end can
   easily tell that a single ACK is reporting multiple MSS and
   behave as if an equivalent number of acks had arrived.

2. i am a big fan of LRO (and similar solutions), because it can save
   a lot of repeated work when passing packets up the stack, and the
   mechanism becomes more and more effective as the system load increases,
   which is a wonderful property in terms of system stability.

   For this reason, i think it would be useful to add support for software
   LRO in the generic code (sys/net/if.c) so that drivers can directly use
   the software implementation even without hardware support.

3. similar to LRO, it would make sense to implement a software TSO
   mechanism where 

Re: Stop scheduler on panic

2011-12-08 Thread John Baldwin

On 12/4/11 5:11 PM, Andriy Gapon wrote:

on 02/12/2011 17:30 m...@freebsd.org said the following:

On Fri, Dec 2, 2011 at 2:05 AM, Andriy Gapona...@freebsd.org  wrote:

on 02/12/2011 06:36 John Baldwin said the following:

Ah, ok (I had thought SCHEDULER_STOPPED was going to always be true when kdb was
active).  But I think these two changes should cover critical_exit() ok.



I attempted to start a discussion about this a few times already :-)
Should we treat kdb context the same as SCHEDULER_STOPPED context (in the
current definition) ?  That is, skip all locks in the same fashion?
There are pros and contras.


Does kdb pause all CPUs with an interrupt (NMI or regular interrupt, I
can no longer remember...) when it enters?  If so, then I'd say
whether it enters via sysctl or panic doesn't matter.  It's in a
special environment where nothing else is running, which is what is
needed for proper exploration of the machine (via breakpoint, for
debugging a hang, etc).

Maybe the question is, why wouldn't SCHEDULER_STOPPED be true
regardless of how kdb is entered?


I think that the discussion that followed has clarified this point a bit.
SCHEDULER_STOPPED perhaps needs a better name :-)  Currently it, the name,
reflects the state of the scheduler, but not why the scheduler is stopped and
not the greater state of the system (in panic), nor how we should handle that
state (bypass locking).  So I'd love something like BYPASS_LOCKING_BECAUSE
_SCHEDULER_IS_STOPPED_IN_PANIC haven't it be so unwieldy :)


Oh, hmm.  Yes, being in the debugger should not potentially corrupt lock 
state, so in that sense it is a weaker stop.


--
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Dog Food tm

2011-12-08 Thread Sean Bruno
On Thu, 2011-12-08 at 02:08 -0800, Peter Maloney wrote:
 And what problems did you run into?
 

More or less, trying to do gmirror(4) style mirroring on GPT partitions
doesn't work.  See http://www.freebsd.org/doc/handbook/geom-mirror.html
for the BIG RED WARNING that says why.

 This guide worked for me:
 http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror

That, along with a lot of how to's, is out of date in the FreeBSD 9
world.  I would suspect that my experience of attempting to setup a
mirrored volume won't be unique.

BSDInstaller and its predecessor Sysinstall don't have any code to
create or destroy zfs(4) or geom(4) volumes.  So, the amount of exposure
to real users is approaching 0 in comparison to the number of people who
really do use FreeBSD.  

I have my hands full with other projects at the moment, but I'm more
than happy to grant access to a two disk SATA server if someone wants to
enhance BSDInstall with zfs(4) or geom(4) volume management features.

At a minimum, you *should* be able to take 2 disks and make a mirrored
volume with either tool.  

Sean

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Dog Food tm

2011-12-08 Thread Tom Evans
On Thu, Dec 8, 2011 at 3:55 PM, Sean Bruno sean...@yahoo-inc.com wrote:
 On Thu, 2011-12-08 at 02:08 -0800, Peter Maloney wrote:
 And what problems did you run into?


 More or less, trying to do gmirror(4) style mirroring on GPT partitions
 doesn't work.  See http://www.freebsd.org/doc/handbook/geom-mirror.html
 for the BIG RED WARNING that says why.


Er, gmirroring GPT _partitions_ works just fine. It is when you try to
gmirror an entire disk that is partitioned with GPT that you have
issues, as gmirror trashes the secondary GPT table at the end of the
disk. You do not have that issue with individual GPT partitions.

Cheers

Tom
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Dog Food tm

2011-12-08 Thread Johan Hendriks

Sean Bruno schreef:

On Thu, 2011-12-08 at 02:08 -0800, Peter Maloney wrote:

And what problems did you run into?


More or less, trying to do gmirror(4) style mirroring on GPT partitions
doesn't work.  See http://www.freebsd.org/doc/handbook/geom-mirror.html
for the BIG RED WARNING that says why.


This guide worked for me:
http://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror

That, along with a lot of how to's, is out of date in the FreeBSD 9
world.  I would suspect that my experience of attempting to setup a
mirrored volume won't be unique.

BSDInstaller and its predecessor Sysinstall don't have any code to
create or destroy zfs(4) or geom(4) volumes.  So, the amount of exposure
to real users is approaching 0 in comparison to the number of people who
really do use FreeBSD.

I have my hands full with other projects at the moment, but I'm more
than happy to grant access to a two disk SATA server if someone wants to
enhance BSDInstall with zfs(4) or geom(4) volume management features.

At a minimum, you *should* be able to take 2 disks and make a mirrored
volume with either tool.

Sean

also a good guide is here.
http://unix-heaven.org/node/24
And gmirror GPT partitions is no problem, only with whole disks is where 
problems arise.


regards
Johan Hendriks
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Dog Food tm

2011-12-08 Thread Bruce Cran

On 08/12/2011 15:55, Sean Bruno wrote:

BSDInstaller and its predecessor Sysinstall don't have any code to
create or destroy zfs(4) or geom(4) volumes.  So, the amount of exposure
to real users is approaching 0 in comparison to the number of people who
really do use FreeBSD.

I have my hands full with other projects at the moment, but I'm more
than happy to grant access to a two disk SATA server if someone wants to
enhance BSDInstall with zfs(4) or geom(4) volume management features.


I don't know if it supports RAID, but the geom-aware rewrite of sade(4) 
supports zfs: http://butcher.heavennet.ru/sade/
Unfortunately it was never completed because of the difficulties of 
using ncurses/dialog.


--
Bruce Cran
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Andre Oppermann

On 08.12.2011 14:11, Lawrence Stewart wrote:

On 12/08/11 05:08, Luigi Rizzo wrote:

On Wed, Dec 07, 2011 at 11:59:43AM +0100, Andre Oppermann wrote:

On 06.12.2011 22:06, Luigi Rizzo wrote:

...

Even in my experiments there is a lot of instability in the results.
I don't know exactly where the problem is, but the high number of
read syscalls, and the huge impact of setting interrupt_rate=0
(defaults at 16us on the ixgbe) makes me think that there is something
that needs investigation in the protocol stack.

Of course we don't want to optimize specifically for the one-flow-at-10G
case, but devising something that makes the system less affected
by short timing variations, and can pass upstream interrupt mitigation
delays would help.


I'm not sure the variance is only coming from the network card and
driver side of things. The TCP processing and interactions with
scheduler and locking probably play a big role as well. There have
been many changes to TCP recently and maybe an inefficiency that
affects high-speed single sessions throughput has crept in. That's
difficult to debug though.


I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which
seems slightly faster than HEAD) using MTU=1500 and various
combinations of card capabilities (hwcsum,tso,lro), different window
sizes and interrupt mitigation configurations.

default latency is 16us, l=0 means no interrupt mitigation.
lro is the software implementation of lro (tcp_lro.c)
hwlro is the hardware one (on 82599). Using a window of 100 Kbytes
seems to give the best results.

Summary:


[snip]


- enabling software lro on the transmit side actually slows
down the throughput (4-5Gbit/s instead of 8.0).
I am not sure why (perhaps acks are delayed too much) ?
Adding a couple of lines in tcp_lro to reject
pure acks seems to have much better effect.

The tcp_lro patch below might actually be useful also for
other cards.

--- tcp_lro.c (revision 228284)
+++ tcp_lro.c (working copy)
@@ -245,6 +250,8 @@

ip_len = ntohs(ip-ip_len);
tcp_data_len = ip_len - (tcp-th_off 2) - sizeof (*ip);
+ if (tcp_data_len == 0)
+ return -1; /* not on ack */


/*


There is a bug with our LRO implementation (first noticed by Jeff Roberson) 
that I started fixing
some time back but dropped the ball on. The crux of the problem is that we 
currently only send an
ACK for the entire LRO chunk instead of all the segments contained therein. 
Given that most stacks
rely on the ACK clock to keep things ticking over, the current behaviour kills 
performance. It may
well be the cause of the performance loss you have observed. WIP patch is at:

http://people.freebsd.org/~lstewart/patches/misctcp/tcplro_multiack_9.x.r219723.patch

Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have LRO 
capable hardware setup
locally to figure out what I've missed. Most of the machines in my lab are 
running em(4) NICs which
don't support LRO, but I'll see if I can find something which does and perhaps 
resurrect this patch.

If anyone has any ideas what I'm missing in the patch to make it work, please 
let me know.


On low RTT's the accumulated ACKing probably doesn't make any difference.
The congestion window will grow very fast anyway.  On longer RTT's it sure
will make a difference.  Unless you have a 10Gig path with  50ms or so it's
difficult to empirically test though.

--
Andre
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Andre Oppermann

On 08.12.2011 16:34, Luigi Rizzo wrote:

On Fri, Dec 09, 2011 at 12:11:50AM +1100, Lawrence Stewart wrote:

On 12/08/11 05:08, Luigi Rizzo wrote:

...

I ran a bunch of tests on the ixgbe (82599) using RELENG_8 (which
seems slightly faster than HEAD) using MTU=1500 and various
combinations of card capabilities (hwcsum,tso,lro), different window
sizes and interrupt mitigation configurations.

default latency is 16us, l=0 means no interrupt mitigation.
lro is the software implementation of lro (tcp_lro.c)
hwlro is the hardware one (on 82599). Using a window of 100 Kbytes
seems to give the best results.

Summary:


[snip]


- enabling software lro on the transmit side actually slows
   down the throughput (4-5Gbit/s instead of 8.0).
   I am not sure why (perhaps acks are delayed too much) ?
   Adding a couple of lines in tcp_lro to reject
   pure acks seems to have much better effect.

The tcp_lro patch below might actually be useful also for
other cards.

--- tcp_lro.c   (revision 228284)
+++ tcp_lro.c   (working copy)
@@ -245,6 +250,8 @@

 ip_len = ntohs(ip-ip_len);
 tcp_data_len = ip_len - (tcp-th_off   2) - sizeof (*ip);
+   if (tcp_data_len == 0)
+   return -1;  /* not on ack */


 /*


There is a bug with our LRO implementation (first noticed by Jeff
Roberson) that I started fixing some time back but dropped the ball on.
The crux of the problem is that we currently only send an ACK for the
entire LRO chunk instead of all the segments contained therein. Given
that most stacks rely on the ACK clock to keep things ticking over, the
current behaviour kills performance. It may well be the cause of the
performance loss you have observed.


I should clarify better.
First of all, i tested two different LRO implementations: our
Software LRO (tcp_lro.c), and the Hardware LRO which is implemented
by the 82599 (called RSC or receive-side-coalescing in the 82599
data sheets). Jack Vogel and Navdeep Parhar (both in Cc) can
probably comment on the logic of both.

In my tests, either SW or HW LRO on the receive side HELPED A LOT,
not just in terms of raw throughput but also in terms of system
load on the receiver. On the receive side, LRO packs multiple data
segments into one that is passed up the stack.

As you mentioned this also reduces the number of acks generated,
but not dramatically (consider, the LRO is bounded by the number
of segments received in the mitigation interval).
In my tests the number of reads() on the receiver was reduced by
approx a factor of 3 compared to the !LRO case, meaning 4-5 segment
merged by LRO. Navdeep reported some numbers for cxgbe with similar
numbers.

Using Hardware LRO on the transmit side had no ill effect.
Being done in hardware i have no idea how it is implemented.

Using Software LRO on the transmit side did give a significant
throughput reduction. I can't explain the exact cause, though it
is possible that between reducing the number of segments to the
receiver and collapsing ACKs that it generates, the sender starves.
But it could well be that it is the extra delay on passing up the ACKs
that limits performance.
Either way, since the HW LRO did a fine job, i was trying to figure
out whether avoiding LRO on pure acks could help, and the two-line
patch above did help.

Note, my patch was just a proof-of-concept, and may cause
reordering if a data segment is followed by a pure ack.
But this can be fixed easily, handling a pure ack as
an out-of-sequence packet in tcp_lro_rx().


 WIP patch is at:
http://people.freebsd.org/~lstewart/patches/misctcp/tcplro_multiack_9.x.r219723.patch

Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have
LRO capable hardware setup locally to figure out what I've missed. Most
of the machines in my lab are running em(4) NICs which don't support
LRO, but I'll see if I can find something which does and perhaps
resurrect this patch.


LRO can always be done in software.  You can do it at driver, ether_input
or ip_input level.


a few comments:
1. i don't think it makes sense to send multiple acks on
coalesced segments (and the 82599 does not seem to do that).
First of all, the acks would get out with minimal spacing (ideally
less than 100ns) so chances are that the remote end will see
them in a single burst anyways. Secondly, the remote end can
easily tell that a single ACK is reporting multiple MSS and
behave as if an equivalent number of acks had arrived.


ABC (appropriate byte counting) gets in the way though.


2. i am a big fan of LRO (and similar solutions), because it can save
a lot of repeated work when passing packets up the stack, and the
mechanism becomes more and more effective as the system load increases,
which is a wonderful property in terms of system stability.

For this reason, i think it would be useful to add support for software
LRO in the generic code (sys/net/if.c) so that drivers can 

Adding bool, true, and false to the kernel

2011-12-08 Thread Matthew Fleming
Hello -current!

Recently on -arch@, we discussed adding the C99 keywords bool, true,
and false to the kernel.  I now have patches to do this as well as fix
up some build issues.

The original thread was here:

http://lists.freebsd.org/pipermail/freebsd-arch/2011-November/011937.html

I split the patches in three:

http://people.freebsd.org/~mdf/0001-e1000-ixgbe-fix-code-to-not-define-bool-true-false-w.patch

fixes up the e1000 and ixgbe code.  Jack, can you please let me know
if there are any issues.  This should work to build whether or not
sys/types.h has the new defines; I am testing make universe right now.

http://people.freebsd.org/~mdf/0002-Fix-code-to-not-define-bool-true-false-when-already-.patch

fixes the other code in the sys/ directory that gives build conflicts.
 Since I wasn't sure of the origin of the drivers, I conservatively
left all their defines alone to allow the same driver code to build on
both CURRENT and 9.0.  If some of the drivers are wholly-owned by
FreeBSD (unlike e1000 and ixgbe) and are not expected to be built on
older releases, then the use of the old defines can be removed.  If
anyone knows the provenance of these files, please advise.

http://people.freebsd.org/~mdf/0003-Define-bool-true-and-false-in-types.h-for-_KERNEL-co.patch

actually defines bool, true, and false, and adds some extra paranoia
in stdbool.h for anyone who has hacked their local build system and
project repo to include stdbool.h in a kernel file.  I also bumped
__FreeBSD_version, though this is probably not necessary since
__bool_true_false_are_defined is a better check than the
__FreeBSD_version.

This code should be MFC-able to stable/9 after 9.0 is released.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


maxio is not exported by mps(4)

2011-12-08 Thread Maksim Yevmenkin
Dear -hackers,

can someone please enlighten me as to why maxio is not set by mps(4),
and, what would be a reasonable number to set it to if i felt inclined
to do so? default DFLTPHYS of 64K seems a bit low to me.

thanks,
max
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: zfs i/o hangs on 9-PRERELEASE

2011-12-08 Thread Mark Felder

Hi all,

Just wanted to report back that I found time to do more diagnostics. 
ZFS/FreeBSD/etc are not to blame. ZFS / FreeBSD never reported any I/O 
errors and scrubs always came up clean because one of the disks was 
failing and had issues reading the disk but would eventually return 
accurate data. It was sort of like having a RAIDZ with one disk that had 
30s latency sometimes! Seatools confirmed this disk was failing and 
after replacing the disk all issues went away.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


FreeBSD 9.0-RC3 Available...

2011-12-08 Thread Ken Smith

The third and what should be final Release Candidate build for the
9.0-RELEASE release cycle is now available.  Since this is the
beginning of a brand new branch (stable/9) I cross-post the
announcements to both -current and -stable.  But just so you know
most of the developers active in head and stable/9 pay more attention
to the -current mailing list.  If you notice problems you can report
them through the normal Gnats PR system or on the -current mailing
list.

This should be the last of the test builds.  We hope to begin the final
release builds in about a week.  The 9.0-RELEASE cycle will be tracked
here:

http://wiki.freebsd.org/Releng/9.0TODO

The location of the FTP install tree and ISOs is the same as it has
been for BETA2/BETA3/RC1/RC2.  The layout to a large degree is being
dictated by the new build infrastructure and installer.  But it's not
particularly well suited to humans so I've added a shorter pathway to
the ISOs.  Unless there are lots of complaints about the layout we'll
stick with this for the release.

ISO images for amd64, i386, ia64, powerpc, powerpc64, and sparc64 are
available here:

  ftp://ftp.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/9.0/

That directory is a set of symbolic links to the ISO images for all of
the supported architectures, and checksum files (for example there is
a symlink named CHECKSUM.MD5-amd64 that points to the CHECKSUM.MD5 file
for the amd64 architecture).

MD5/SHA256 checksums are tacked on below.

If you would like to use csup/cvsup mechanisms to access the source
tree the branch tag to use is now RELENG_9_0, if you use . (head)
you will get 10-CURRENT.  If you would like to access the source tree
via SVN it is svn://svn.freebsd.org/base/releng/9.0/.  We still have
the nit that the creation of a new SVN branch winds up causing what
looks like a check-in of the entire tree in CVS (a side-effect of the
svn2cvs exporter) so mergemaster -F is your friend if you are using
csup/cvsup.

FreeBSD Update
--

The freebsd-update(8) utility supports binary upgrades of i386 and amd64 systems
running earlier FreeBSD releases. Systems running 7.[34]-RELEASE,
8.[12]-RELEASE, 9.0-BETA[123], or 9.0-RC[1,2] can upgrade as follows:

First, a minor change must be made to the freebsd-update code in order
for it to accept file names appearing in FreeBSD 9.0 which contain the '%'
and '@' characters; without this change, freebsd-update will error out
with the message The update metadata is correctly signed, but failed an
integrity check.

# sed -i '' -e 's/=_/=%@_/' /usr/sbin/freebsd-update

Now freebsd-update can fetch bits belonging to 9.0-RC3.  During this process
freebsd-update will ask for help in merging configuration files.

# freebsd-update upgrade -r 9.0-RC3

Due to changes in the way that FreeBSD is packaged on the release media, two
complications may arise in this process if upgrading from FreeBSD 7.x or 8.x:
1. The FreeBSD kernel, which previously could appear in either /boot/kernel
or /boot/GENERIC, now only appears as /boot/kernel.  As a result, any kernel
appearing in /boot/GENERIC will be deleted.  Please carefully read the output
printed by freebsd-update and confirm that an updated kernel will be placed
into /boot/kernel before proceeding beyond this point.
2. The FreeBSD source tree in /usr/src (if present) will be deleted.  (Normally
freebsd-update will update a source tree, but in this case the changes in
release packaging result in freebsd-update not recognizing that the source tree
from the old release and the source tree from the new release correspond to the
same part of FreeBSD.)

# freebsd-update install

The system must now be rebooted with the newly installed kernel before the
non-kernel components are updated.

# shutdown -r now

After rebooting, freebsd-update needs to be run again to install the new
userland components:

# freebsd-update install

At this point, users of systems being upgraded from FreeBSD 8.2-RELEASE or
earlier will be prompted by freebsd-update to rebuild all third-party
applications (e.g., ports installed from the ports tree) due to updates in
system libraries.

After updating installed third-party applications (and again, only if
freebsd-update printed a message indicating that this was necessary), run
freebsd-update again so that it can delete the old (no longer used) system
libraries:

# freebsd-update install
Finally, reboot into 9.0-RC3:

Checksums:

MD5 (FreeBSD-9.0-RC3-amd64-bootonly.iso) = 53f2bc5a3d18124769bfb066e921559a
MD5 (FreeBSD-9.0-RC3-amd64-disc1.iso) = b88eca54341523713712b184c6a7fc9a
MD5 (FreeBSD-9.0-RC3-amd64-memstick.img) = a9b58348736d4a7a179941e818d33986

MD5 (FreeBSD-9.0-RC3-i386-bootonly.iso) = 86f0410ffb1c55fcb8faf33814e6e95b
MD5 (FreeBSD-9.0-RC3-i386-disc1.iso) = 3585047256b1b8f72319aa55ffa3c3ad
MD5 (FreeBSD-9.0-RC3-i386-memstick.img) = 8be95b49c498e666f87957a8c10997ce

MD5 (FreeBSD-9.0-RC3-ia64-bootonly.iso) = 7a8e99a61d21ae8a5f6be9fb7f878b11
MD5 (FreeBSD-9.0-RC3-ia64-memstick) = 

Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ?

2011-12-08 Thread Luigi Rizzo
On Fri, Dec 09, 2011 at 01:33:04AM +0100, Andre Oppermann wrote:
 On 08.12.2011 16:34, Luigi Rizzo wrote:
 On Fri, Dec 09, 2011 at 12:11:50AM +1100, Lawrence Stewart wrote:
...
 Jeff tested the WIP patch and it *doesn't* fix the issue. I don't have
 LRO capable hardware setup locally to figure out what I've missed. Most
 of the machines in my lab are running em(4) NICs which don't support
 LRO, but I'll see if I can find something which does and perhaps
 resurrect this patch.
 
 LRO can always be done in software.  You can do it at driver, ether_input
 or ip_input level.

storing LRO state at the driver (as it is done now) is very convenient,
because it is trivial to flush the pending segments at the end of
an rx interrupt. If you want to do LRO in ether_input() or ip_input(),
you need to add another call to flush the LRO state stored there.

 a few comments:
 1. i don't think it makes sense to send multiple acks on
 coalesced segments (and the 82599 does not seem to do that).
 First of all, the acks would get out with minimal spacing (ideally
 less than 100ns) so chances are that the remote end will see
 them in a single burst anyways. Secondly, the remote end can
 easily tell that a single ACK is reporting multiple MSS and
 behave as if an equivalent number of acks had arrived.
 
 ABC (appropriate byte counting) gets in the way though.

right, during slow start the current ABC specification (RFC3465)
sets a prettly low limit on how much the window can be expanded
on each ACK. On the other hand...

 2. i am a big fan of LRO (and similar solutions), because it can save
 a lot of repeated work when passing packets up the stack, and the
 mechanism becomes more and more effective as the system load increases,
 which is a wonderful property in terms of system stability.
 
 For this reason, i think it would be useful to add support for software
 LRO in the generic code (sys/net/if.c) so that drivers can directly use
 the software implementation even without hardware support.
 
 It hurts on higher RTT links in the general case.  For LAN RTT's
 it's good.

... on the other hand remember that LRO coalescing is limited to
the number of segments that arrive during a mitigation interval,
so even on a 10G interface is't only a handful of packets.
I better run some simulations to see how long it takes to
get full rate on a 10..50ms path when using LRO.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org