[Bug 193246] Bug in IPv6 multicast join(), uncovered by Jenkins
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193246 Ronald Klop ronald-li...@klop.ws changed: What|Removed |Added CC||ronald-li...@klop.ws --- Comment #9 from Ronald Klop ronald-li...@klop.ws --- Not a solution, but does it work as a workaround to add this option on the commandline to java? -Djava.net.preferIPv4Stack=true -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: netmap extra rings and buffers
Thank you! On 2014-9-4, at 17:48, Luigi Rizzo ri...@iet.unipi.it wrote: On Thu, Sep 04, 2014 at 11:58:28AM +, Eggert, Lars wrote: Hi Luigi, I'm allocating extra rings and/or extra buffers via the nr_arg1/nr_arg3 parameters for NIOCREGIF. Once I've done that, how do I actually access those rings and buffers? For extra rings, the documentation and example code don't really say anything. For extra buffers, the documentation says nifp-ni_bufs_head will be the index of the first buffer but doesn't really explain how I can find the buffer given its index (since it's not in a ring, the NETMAP_BUF macro doesn't seem to apply?) The part about buffers are linked to each other using the first uint32_t as the index is also unclear to me. Do you have some more text or example code that shows how to use extra rings and buffers? the ifield to request extra rings is only important when you want to make sure that the memory region for a VALE port has also space to host some pipes. Otherwise, for physical ports (which at the moment all share the same address space) there is not a real need to specify it. For the extra buffers, remember that NETMAP_BUF() can translate buffer indexes for any netmap buffer, even those not in a ring. All it does is grab the base address of the buffer pool from the ring, and add the buffer index times the buffer size. So you can navigate the pool of extra buffers as follows uint32_t x = nifp-ni_bufs_head; // index of first buf void *p = NETMAP_BUF(any_ring, x); // address of the first buffer x = *((uint32_t *)p); // index of the next buffer cheers luigi signature.asc Description: Message signed with OpenPGP using GPGMail
Re: ixgbe CRITICAL: ECC ERROR!! Please Reboot!!
Hi Adrian, I confirmed with the support staff of the room where the server is, that the ambient temperature was normal. On 04/09/2014 22:46, Marcelo Gondim wrote: On 04/09/2014 20:48, Adrian Chadd wrote: Hi, The only time this has happened to me is because the card overheated. Can you check that? Hi Adrian, The room where the equipment is located is very cold but I'll check it out. Also seen at the time of the problem, a lot of dropped packets. # netstat -idn ... ix01500 Link#9 a0:36:9f:2a:6d:ac 18446743423829095869 159 750924631703 53285910688 0 0 0 ix0 - fe80::a236:9f fe80::a236:9fff:f0 - - 2 - - - ix11500 Link#10 a0:36:9f:2a:6d:ae 18446743954328745465 0 119550050209 20178077451 0 0 0 ix1 - fe80::a236:9f fe80::a236:9fff:f0 - - 1 - - - ... 119550050209 droped packets on ix1 and 750924631703 droped on ix0 Could be interesting I upgrade to10.1-PRERELEASE? Could there be a problem with the driver? Traffic on ix0: 1.4Gbps output / 600Mbps input Traffic on ix1: 1.2Gbps output PPS on ix0: 163Kpps output / 215Kpps input PPS on ix1: 131Kpps output Thanks for your help. -a On 4 September 2014 16:14, Marcelo Gondim gon...@bsdinfo.com.br wrote: Hi All, I have an Intel X520-SR2and today was working when all traffic stopped. I looked in the logs and found this message: Sep 4 18:29:53 rt01 kernel: ix1: Sep 4 18:29:53 rt01 kernel: CRITICAL: ECC ERROR!! Please Reboot!! # uname -a FreeBSD rt01.x.com.br 10.0-STABLE FreeBSD 10.0-STABLE #10 r267839: Thu Jul 10 15:35:04 BRT 2014 r...@rt01.x.com.br:/usr/obj/usr/src/sys/GONDIM10 amd64 # netstat -m 98324/53476/151800 mbufs in use (current/cache/total) 98301/44951/143252/1014370 mbuf clusters in use (current/cache/total/max) 98301/44897 mbuf+clusters out of packet secondary zone in use (current/cache) 0/421/421/507184 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/150276 9k jumbo clusters in use (current/cache/total/max) 0/0/0/84530 16k jumbo clusters in use (current/cache/total/max) 221183K/104955K/326138K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile Best regards, Gondim ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org -- ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
[Bug 193246] Bug in IPv6 multicast join(), uncovered by Jenkins
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193246 --- Comment #10 from Craig Rodrigues rodr...@freebsd.org --- Ronald, Thanks for the tip. java -Djava.net.preferIPv4Stack=true MulticastTest seems to work around the problem. I would still like to see FreeBSD fixed so that this workaround is not required. Even though it is a lot of work, I would like to see Java on FreeBSD behave out of the box, just like Linux, without the FreeBSD community needing to push lots of patches upstream to different Java software authors. I want there to be less motivation for people to migrate from FreeBSD to Linux if they are deploying Java applications. That's why I've spent the time to analyze the problem and reporting my findings in this bug report. I find this audit trail is quite interesting. :) -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
[RFC] Patch to improve TSO limitation formula in general
Hi, I've tested the attached patch with success and would like to have some feedback from other FreeBSD network developers. The problem is that the current TSO limitation only limits the number of bytes that can be transferred in a TSO packet and not the number of mbuf's. The current solution is to have a quick and dirty custom m_dup() in the TX path to re-allocate the mbuf chains into 4K ones to make it simple. All of this hack can be avoided if the definition of the TSO limit can be changed a bit, like shown here: /* + * Structure defining hardware TSO limits. + */ +struct if_tso_limit { + u_int raw_value[0]; /* access all fields as one */ + u_char frag_count; /* maximum number of fragments: 1..255 */ + u_char frag_size_log2; /* maximum fragment size: 2 ** (12..16) */ + u_char hdr_size_log2; /* maximum header size: 2 ** (2..8) */ + u_char reserved;/* zero */ +}; First we need to know the maximum fragment count. Typical value is 32. Second we need to know the maximum fragment size. Typical value is 4K. Last we need to know of any headers that should be subtracted from the maximum. Hence this code is running in the fast path, I would like to use u_char for all fields and allow copy-only access as a u_int as an optimization. This avoids cludges and messing with additional header files. I would like to push this patch after some more testing to -current and then to 10-stable hopefully before the coming 10-release, because the current solution is affecting performance of the Mellanox based network adapters in an unfair way. For example by setting the current TSO limit to 32KBytes which will be OK for all-2K fragments, we see a severe degradation in performance. Even though the hardware is fully capable of transmitting 16 4K mbufs. Comments and reviews are welcome! --HPS === sys/dev/oce/oce_if.c == --- sys/dev/oce/oce_if.c (revision 270996) +++ sys/dev/oce/oce_if.c (local) @@ -1731,7 +1731,9 @@ sc-ifp-if_baudrate = IF_Gbps(10); #if __FreeBSD_version = 100 - sc-ifp-if_hw_tsomax = OCE_MAX_TSO_SIZE; + sc-ifp-if_hw_tsomax.frag_count = 29; /* 29 elements */ + sc-ifp-if_hw_tsomax.frag_size_log2 = 12; /* 4K */ + sc-ifp-if_hw_tsomax.hdr_size_log2 = 5; /* ETH+VLAN 2**5 */ #endif ether_ifattach(sc-ifp, sc-macaddr.mac_addr); === sys/dev/oce/oce_if.h == --- sys/dev/oce/oce_if.h (revision 270996) +++ sys/dev/oce/oce_if.h (local) @@ -152,7 +152,6 @@ #define OCE_MAX_TX_ELEMENTS 29 #define OCE_MAX_TX_DESC 1024 #define OCE_MAX_TX_SIZE 65535 -#define OCE_MAX_TSO_SIZE (65535 - ETHER_HDR_LEN) #define OCE_MAX_RX_SIZE 4096 #define OCE_MAX_RQ_POSTS 255 #define OCE_DEFAULT_PROMISCUOUS 0 === sys/dev/vmware/vmxnet3/if_vmx.c == --- sys/dev/vmware/vmxnet3/if_vmx.c (revision 270996) +++ sys/dev/vmware/vmxnet3/if_vmx.c (local) @@ -1722,7 +1722,9 @@ ifp-if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST; ifp-if_init = vmxnet3_init; ifp-if_ioctl = vmxnet3_ioctl; - ifp-if_hw_tsomax = VMXNET3_TSO_MAXSIZE; + ifp-if_hw_tsomax.frag_count = VMXNET3_TX_MAXSEGS; + ifp-if_hw_tsomax.frag_size_log2 = VMXNET3_TX_MAXSEGSHIFT; + ifp-if_hw_tsomax.hdr_size_log2 = 5; /* ETH+VLAN 2**5 */ #ifdef VMXNET3_LEGACY_TX ifp-if_start = vmxnet3_start; === sys/dev/vmware/vmxnet3/if_vmxvar.h == --- sys/dev/vmware/vmxnet3/if_vmxvar.h (revision 270996) +++ sys/dev/vmware/vmxnet3/if_vmxvar.h (local) @@ -277,14 +277,13 @@ */ #define VMXNET3_TX_MAXSEGS 32 #define VMXNET3_TX_MAXSIZE (VMXNET3_TX_MAXSEGS * MCLBYTES) -#define VMXNET3_TSO_MAXSIZE \ -(VMXNET3_TX_MAXSIZE - sizeof(struct ether_vlan_header)) /* * Maximum support Tx segments size. The length field in the * Tx descriptor is 14 bits. */ -#define VMXNET3_TX_MAXSEGSIZE (1 14) +#define VMXNET3_TX_MAXSEGSHIFT 14 +#define VMXNET3_TX_MAXSEGSIZE (1 VMXNET3_TX_MAXSEGSHIFT) /* * The maximum number of Rx segments we accept. When LRO is enabled, === sys/dev/xen/netfront/netfront.c == --- sys/dev/xen/netfront/netfront.c (revision 270996) +++ sys/dev/xen/netfront/netfront.c (local) @@ -134,7 +134,6 @@ * to mirror the Linux MAX_SKB_FRAGS constant. */ #define MAX_TX_REQ_FRAGS (65536 / PAGE_SIZE + 2) -#define NF_TSO_MAXBURST ((IP_MAXPACKET / PAGE_SIZE) * MCLBYTES) #define RX_COPY_THRESHOLD 256 @@ -2102,7 +2101,9 @@ ifp-if_hwassist = XN_CSUM_FEATURES; ifp-if_capabilities = IFCAP_HWCSUM; - ifp-if_hw_tsomax = NF_TSO_MAXBURST; + ifp-if_hw_tsomax.frag_count = MAX_TX_REQ_FRAGS; + ifp-if_hw_tsomax.frag_size_log2 = PAGE_SHIFT; + ifp-if_hw_tsomax.hdr_size_log2 = 5; /* ETH+VLAN 2**5 */ ether_ifattach(ifp, np-mac);
Re: ixgbe CRITICAL: ECC ERROR!! Please Reboot!!
Hi, But is the airflow in the unit sufficient? I had this problem at a previous job - the box was running fine, the room was very cold, but the internal fans in the server were set to be very quiet. It wasn't enough to keep the ixgbe NICs happy. I had to change the fan settings to just always run full speed. The fan temperature feedback loop was based on sensors on the CPU, _not_ on the peripherals. -a On 5 September 2014 07:35, Marcelo Gondim gon...@bsdinfo.com.br wrote: Hi Adrian, I confirmed with the support staff of the room where the server is, that the ambient temperature was normal. On 04/09/2014 22:46, Marcelo Gondim wrote: On 04/09/2014 20:48, Adrian Chadd wrote: Hi, The only time this has happened to me is because the card overheated. Can you check that? Hi Adrian, The room where the equipment is located is very cold but I'll check it out. Also seen at the time of the problem, a lot of dropped packets. # netstat -idn ... ix01500 Link#9 a0:36:9f:2a:6d:ac 18446743423829095869 159 750924631703 53285910688 0 0 0 ix0 - fe80::a236:9f fe80::a236:9fff:f0 - -2 - - - ix11500 Link#10 a0:36:9f:2a:6d:ae 18446743954328745465 0 119550050209 20178077451 0 0 0 ix1 - fe80::a236:9f fe80::a236:9fff:f0 - -1 - - - ... 119550050209 droped packets on ix1 and 750924631703 droped on ix0 Could be interesting I upgrade to10.1-PRERELEASE? Could there be a problem with the driver? Traffic on ix0: 1.4Gbps output / 600Mbps input Traffic on ix1: 1.2Gbps output PPS on ix0: 163Kpps output / 215Kpps input PPS on ix1: 131Kpps output Thanks for your help. -a On 4 September 2014 16:14, Marcelo Gondim gon...@bsdinfo.com.br wrote: Hi All, I have an Intel X520-SR2and today was working when all traffic stopped. I looked in the logs and found this message: Sep 4 18:29:53 rt01 kernel: ix1: Sep 4 18:29:53 rt01 kernel: CRITICAL: ECC ERROR!! Please Reboot!! # uname -a FreeBSD rt01.x.com.br 10.0-STABLE FreeBSD 10.0-STABLE #10 r267839: Thu Jul 10 15:35:04 BRT 2014 r...@rt01.x.com.br:/usr/obj/usr/src/sys/GONDIM10 amd64 # netstat -m 98324/53476/151800 mbufs in use (current/cache/total) 98301/44951/143252/1014370 mbuf clusters in use (current/cache/total/max) 98301/44897 mbuf+clusters out of packet secondary zone in use (current/cache) 0/421/421/507184 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/150276 9k jumbo clusters in use (current/cache/total/max) 0/0/0/84530 16k jumbo clusters in use (current/cache/total/max) 221183K/104955K/326138K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile Best regards, Gondim ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org -- ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: ixgbe CRITICAL: ECC ERROR!! Please Reboot!!
On 05/09/2014 16:49, Adrian Chadd wrote: Hi, But is the airflow in the unit sufficient? I had this problem at a previous job - the box was running fine, the room was very cold, but the internal fans in the server were set to be very quiet. It wasn't enough to keep the ixgbe NICs happy. I had to change the fan settings to just always run full speed. The fan temperature feedback loop was based on sensors on the CPU, _not_ on the peripherals. Hi Adrian, Ummm. I'll check it and improve internal cooling. :) She is not happy and I'm also not. rsrsrsr Cheers, -a On 5 September 2014 07:35, Marcelo Gondim gon...@bsdinfo.com.br wrote: Hi Adrian, I confirmed with the support staff of the room where the server is, that the ambient temperature was normal. On 04/09/2014 22:46, Marcelo Gondim wrote: On 04/09/2014 20:48, Adrian Chadd wrote: Hi, The only time this has happened to me is because the card overheated. Can you check that? Hi Adrian, The room where the equipment is located is very cold but I'll check it out. Also seen at the time of the problem, a lot of dropped packets. # netstat -idn ... ix01500 Link#9 a0:36:9f:2a:6d:ac 18446743423829095869 159 750924631703 53285910688 0 0 0 ix0 - fe80::a236:9f fe80::a236:9fff:f0 - -2 - - - ix11500 Link#10 a0:36:9f:2a:6d:ae 18446743954328745465 0 119550050209 20178077451 0 0 0 ix1 - fe80::a236:9f fe80::a236:9fff:f0 - -1 - - - ... 119550050209 droped packets on ix1 and 750924631703 droped on ix0 Could be interesting I upgrade to10.1-PRERELEASE? Could there be a problem with the driver? Traffic on ix0: 1.4Gbps output / 600Mbps input Traffic on ix1: 1.2Gbps output PPS on ix0: 163Kpps output / 215Kpps input PPS on ix1: 131Kpps output Thanks for your help. -a On 4 September 2014 16:14, Marcelo Gondim gon...@bsdinfo.com.br wrote: Hi All, I have an Intel X520-SR2and today was working when all traffic stopped. I looked in the logs and found this message: Sep 4 18:29:53 rt01 kernel: ix1: Sep 4 18:29:53 rt01 kernel: CRITICAL: ECC ERROR!! Please Reboot!! # uname -a FreeBSD rt01.x.com.br 10.0-STABLE FreeBSD 10.0-STABLE #10 r267839: Thu Jul 10 15:35:04 BRT 2014 r...@rt01.x.com.br:/usr/obj/usr/src/sys/GONDIM10 amd64 # netstat -m 98324/53476/151800 mbufs in use (current/cache/total) 98301/44951/143252/1014370 mbuf clusters in use (current/cache/total/max) 98301/44897 mbuf+clusters out of packet secondary zone in use (current/cache) 0/421/421/507184 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/150276 9k jumbo clusters in use (current/cache/total/max) 0/0/0/84530 16k jumbo clusters in use (current/cache/total/max) 221183K/104955K/326138K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile Best regards, Gondim ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: [RFC] Patch to improve TSO limitation formula in general
There are some concerns if we use this with devices that ixl supports: - The maximum fragment size is 16KB-1, which isn't a power of 2. - You can't get the maximum TSO size for ixl devices by multiplying the maximum number of fragments by the maximum size. Instead the number of fragments is AFAIK unlimited, but a segment can only span 8 mbufs (including the [up to 3] mbufs containing the header), and the maximum TSO size is 256KB. And one question: - Is hdr_size_log2 supposed to be the length of the L2 header? We can fit 254 L2 bytes in our hardware during a TSO, so if that's the value, I guess that's fine, barring the it-not-being-a-power-of-2 issue. With all that said, the 8 mbuf limit per segment issue is a TSO limitation that we'd like to notify the stack about, so I wonder if that could be incorporated along with this. Right now, our driver checks to see if a segment in a TSO spans more than six mbufs and then m_defrag()'s the entire chain if one exists; it's less than optimal but necessary to prevent errors. - Eric --- - Eric Joyner On Fri, Sep 5, 2014 at 11:37 AM, Hans Petter Selasky h...@selasky.org wrote: Hi, I've tested the attached patch with success and would like to have some feedback from other FreeBSD network developers. The problem is that the current TSO limitation only limits the number of bytes that can be transferred in a TSO packet and not the number of mbuf's. The current solution is to have a quick and dirty custom m_dup() in the TX path to re-allocate the mbuf chains into 4K ones to make it simple. All of this hack can be avoided if the definition of the TSO limit can be changed a bit, like shown here: /* + * Structure defining hardware TSO limits. + */ +struct if_tso_limit { + u_int raw_value[0]; /* access all fields as one */ + u_char frag_count; /* maximum number of fragments: 1..255 */ + u_char frag_size_log2; /* maximum fragment size: 2 ** (12..16) */ + u_char hdr_size_log2; /* maximum header size: 2 ** (2..8) */ + u_char reserved;/* zero */ +}; First we need to know the maximum fragment count. Typical value is 32. Second we need to know the maximum fragment size. Typical value is 4K. Last we need to know of any headers that should be subtracted from the maximum. Hence this code is running in the fast path, I would like to use u_char for all fields and allow copy-only access as a u_int as an optimization. This avoids cludges and messing with additional header files. I would like to push this patch after some more testing to -current and then to 10-stable hopefully before the coming 10-release, because the current solution is affecting performance of the Mellanox based network adapters in an unfair way. For example by setting the current TSO limit to 32KBytes which will be OK for all-2K fragments, we see a severe degradation in performance. Even though the hardware is fully capable of transmitting 16 4K mbufs. Comments and reviews are welcome! --HPS ___ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: [RFC] Patch to improve TSO limitation formula in general
On 09/05/14 23:19, Eric Joyner wrote: There are some concerns if we use this with devices that ixl supports: - The maximum fragment size is 16KB-1, which isn't a power of 2. Hi Eric, Multiplying by powers of two are more fast, than non-powers of two. So in this case you would have to use 8KB as a maximum. - You can't get the maximum TSO size for ixl devices by multiplying the maximum number of fragments by the maximum size. Instead the number of fragments is AFAIK unlimited, but a segment can only span 8 mbufs (including the [up to 3] mbufs containing the header), and the maximum TSO size is 256KB. And one question: - Is hdr_size_log2 supposed to be the length of the L2 header? We can fit 254 L2 bytes in our hardware during a TSO, so if that's the value, I guess that's fine, barring the it-not-being-a-power-of-2 issue. This is the ethernet / vlan headers. It is added with the TCP/IP-header in the end. With all that said, the 8 mbuf limit per segment issue is a TSO limitation that we'd like to notify the stack about, so I wonder if that could be incorporated along with this. Right now, our driver checks to see if a segment in a TSO spans more than six mbufs and then m_defrag()'s the entire chain if one exists; it's less than optimal but necessary to prevent errors. It is not impossible to move from log2 syntax to non-log2 syntax, hence the logic will be exactly the same, only that the required division and multiplication will have a bit overhead I guess. Could you make a patch on top of my patch with the changes you think are required to fully support the ixl hardware? Or propose a new patch which also serves the MLX needs? Thank you! --HPS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: [RFC] Patch to improve TSO limitation formula in general
Hans Petter Selesky wrote: On 09/05/14 23:19, Eric Joyner wrote: There are some concerns if we use this with devices that ixl supports: - The maximum fragment size is 16KB-1, which isn't a power of 2. Hi Eric, Multiplying by powers of two are more fast, than non-powers of two. So in this case you would have to use 8KB as a maximum. Well, I'm no architecture expert, but I really doubt the CPU delay of a non-power of 2 multiply/divide is significant related to doing smaller TSO segments. Long ago (as in 1970s) I did work on machines where shifts for power of 2 multiply/divide was preferable, but these days I doubt it is going to matter?? - You can't get the maximum TSO size for ixl devices by multiplying the maximum number of fragments by the maximum size. Instead the number of fragments is AFAIK unlimited, but a segment can only span 8 mbufs (including the [up to 3] mbufs containing the header), and the maximum TSO size is 256KB. And one question: - Is hdr_size_log2 supposed to be the length of the L2 header? We can fit 254 L2 bytes in our hardware during a TSO, so if that's the value, I guess that's fine, barring the it-not-being-a-power-of-2 issue. This is the ethernet / vlan headers. It is added with the TCP/IP-header in the end. With all that said, the 8 mbuf limit per segment issue is a TSO limitation that we'd like to notify the stack about, so I wonder if that could be incorporated along with this. Right now, our driver checks to see if a segment in a TSO spans more than six mbufs and then m_defrag()'s the entire chain if one exists; it's less than optimal but necessary to prevent errors. At this time, if there is a limit of 8 TSO segments (mbufs) in a transmit list, you will need to set: ifp-if_hw_tsomax = 8 * MCLBYTES - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); - just before the call to ether_ifattach(ifp); I do have an untested patch (attached in case anyone is interested) which adds if_hw_tsomaxseg that drivers can set to their maximum number of transmit segments (mbufs) fot TSO. This value is then used by tcp_output() to generate appropriately sized TSO segments. However, I'm just working on getting a way to test this patch, so I can't say if/when it will be in head. rick It is not impossible to move from log2 syntax to non-log2 syntax, hence the logic will be exactly the same, only that the required division and multiplication will have a bit overhead I guess. Could you make a patch on top of my patch with the changes you think are required to fully support the ixl hardware? Or propose a new patch which also serves the MLX needs? Thank you! --HPS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org --- kern/uipc_sockbuf.c.sav 2014-01-30 20:27:17.0 -0500 +++ kern/uipc_sockbuf.c 2014-01-30 22:12:08.0 -0500 @@ -965,6 +965,39 @@ sbsndptr(struct sockbuf *sb, u_int off, } /* + * Return the first mbuf for the provided offset. + */ +struct mbuf * +sbsndmbuf(struct sockbuf *sb, u_int off, long *first_len) +{ + struct mbuf *m; + + KASSERT(sb-sb_mb != NULL, (%s: sb_mb is NULL, __func__)); + + *first_len = 0; + /* + * Is off below stored offset? Happens on retransmits. + * If so, just use sb_mb. + */ + if (sb-sb_sndptr == NULL || sb-sb_sndptroff off) + m = sb-sb_mb; + else { + m = sb-sb_sndptr; + off -= sb-sb_sndptroff; + } + while (off 0 m != NULL) { + if (off m-m_len) + break; + off -= m-m_len; + m = m-m_next; + } + if (m != NULL) + *first_len = m-m_len - off; + + return (m); +} + +/* * Drop a record off the front of a sockbuf and move the next record to the * front. */ --- sys/sockbuf.h.sav 2014-01-30 20:42:28.0 -0500 +++ sys/sockbuf.h 2014-01-30 22:08:43.0 -0500 @@ -153,6 +153,8 @@ int sbreserve_locked(struct sockbuf *sb, struct thread *td); struct mbuf * sbsndptr(struct sockbuf *sb, u_int off, u_int len, u_int *moff); +struct mbuf * + sbsndmbuf(struct sockbuf *sb, u_int off, long *first_len); void sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb); int sbwait(struct sockbuf *sb); int sblock(struct sockbuf *sb, int flags); --- netinet/tcp_input.c.sav 2014-01-30 19:37:52.0 -0500 +++ netinet/tcp_input.c 2014-01-30 19:39:07.0 -0500 @@ -3627,6 +3627,7 @@ tcp_mss(struct tcpcb *tp, int offer) if (cap.ifcap CSUM_TSO) { tp-t_flags |= TF_TSO; tp-t_tsomax = cap.tsomax; + tp-t_tsomaxsegs = cap.tsomaxsegs; } } --- netinet/tcp_output.c.sav 2014-01-30 18:55:15.0 -0500 +++ netinet/tcp_output.c 2014-01-30 22:18:56.0 -0500 @@ -166,8 +166,8 @@ int tcp_output(struct tcpcb *tp) { struct socket *so = tp-t_inpcb-inp_socket; - long len, recwin, sendwin; - int off, flags, error = 0; /* Keep compiler happy */ + long len, recwin,
Re: [RFC] Patch to improve TSO limitation formula in general
Hans Petter Selasky wrote: Hi, I've tested the attached patch with success and would like to have some feedback from other FreeBSD network developers. The problem is that the current TSO limitation only limits the number of bytes that can be transferred in a TSO packet and not the number of mbuf's. The current solution is to have a quick and dirty custom m_dup() in the TX path to re-allocate the mbuf chains into 4K ones to make it simple. All of this hack can be avoided if the definition of the TSO limit can be changed a bit, like shown here: /* + * Structure defining hardware TSO limits. + */ +struct if_tso_limit { + u_int raw_value[0]; /* access all fields as one */ + u_char frag_count; /* maximum number of fragments: 1..255 */ + u_char frag_size_log2; /* maximum fragment size: 2 ** (12..16) */ + u_char hdr_size_log2; /* maximum header size: 2 ** (2..8) */ + u_char reserved;/* zero */ +}; First we need to know the maximum fragment count. Typical value is 32. Second we need to know the maximum fragment size. Typical value is 4K. Last we need to know of any headers that should be subtracted from the maximum. Hence this code is running in the fast path, I would like to use u_char for all fields and allow copy-only access as a u_int as an optimization. This avoids cludges and messing with additional header files. I would like to push this patch after some more testing to -current and then to 10-stable hopefully before the coming 10-release, because the current solution is affecting performance of the Mellanox based network adapters in an unfair way. For example by setting the current TSO limit to 32KBytes which will be OK for all-2K fragments, we see a severe degradation in performance. Even though the hardware is fully capable of transmitting 16 4K mbufs. Ok, I didn't see this until now, but I will take a look at the patch. My main comment is that I tried using a mix of 2K and 4K mbuf clusters in NFS and was able (with some effort) get the UMA allocator all messed up, where it would basically be stuck because it couldn't allocate boundary tags. As such, until this issue w.r.t. UMA is rssolved, mixing MCLBYTES and MPAGESIZE clusters is very dangerous imho. (alc@ did send me a simple patch related to this UMA problem, but I haven't been able to test it yet.) rick ps: For the M_WAITOK case, the allocator problem shows up as threads sleeping on btallo which happens in vmem_bt_alloc() in kern/subr_vmem.c. It may never happen on 64bit arches, but it can definitely happen on i386. Comments and reviews are welcome! --HPS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: [RFC] Patch to improve TSO limitation formula in general
On 09/06/14 00:09, Rick Macklem wrote: Hans Petter Selesky wrote: On 09/05/14 23:19, Eric Joyner wrote: There are some concerns if we use this with devices that ixl supports: - The maximum fragment size is 16KB-1, which isn't a power of 2. Hi Eric, Multiplying by powers of two are more fast, than non-powers of two. So in this case you would have to use 8KB as a maximum. Well, I'm no architecture expert, but I really doubt the CPU delay of a non-power of 2 multiply/divide is significant related to doing smaller TSO segments. Long ago (as in 1970s) I did work on machines where shifts for power of 2 multiply/divide was preferable, but these days I doubt it is going to matter?? Hi, You also need to patch LAGG and VLAN drivers? --HPS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: [RFC] Patch to improve TSO limitation formula in general
Hans Petter Selasky wrote: On 09/06/14 00:09, Rick Macklem wrote: Hans Petter Selesky wrote: On 09/05/14 23:19, Eric Joyner wrote: There are some concerns if we use this with devices that ixl supports: - The maximum fragment size is 16KB-1, which isn't a power of 2. Hi Eric, Multiplying by powers of two are more fast, than non-powers of two. So in this case you would have to use 8KB as a maximum. Well, I'm no architecture expert, but I really doubt the CPU delay of a non-power of 2 multiply/divide is significant related to doing smaller TSO segments. Long ago (as in 1970s) I did work on machines where shifts for power of 2 multiply/divide was preferable, but these days I doubt it is going to matter?? Hi, You also need to patch LAGG and VLAN drivers? Yep. I already ran into the fact that these drivers didn't pass if_hw_tsomax up and patched them for that recently. The same will be necessary for if_hw_tsomaxseg if/when it goes into head. As I said, this patch is currently completely untested and, even once I get it tested/working, there will need to be a discussion on freebsd-net@ w.r.t. whether it is appropriate for head. I will take a look at your patch around Monday. Btw, when setting if_hw_tsomax as I suggested in the first post, you will still end up doing a lot of m_defrag() calls for NFS RPC messages, but at least they will get through. rick --HPS ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: [RFC] Patch to improve TSO limitation formula in general
Hans Petter Selasky wrote this message on Fri, Sep 05, 2014 at 20:37 +0200: I've tested the attached patch with success and would like to have some feedback from other FreeBSD network developers. The problem is that the current TSO limitation only limits the number of bytes that can be transferred in a TSO packet and not the number of mbuf's. The current solution is to have a quick and dirty custom m_dup() in the TX path to re-allocate the mbuf chains into 4K ones to make it simple. All of this hack can be avoided if the definition of the TSO limit can be changed a bit, like shown here: /* + * Structure defining hardware TSO limits. + */ +struct if_tso_limit { + u_int raw_value[0]; /* access all fields as one */ + u_char frag_count; /* maximum number of fragments: 1..255 */ + u_char frag_size_log2; /* maximum fragment size: 2 ** (12..16) */ + u_char hdr_size_log2; /* maximum header size: 2 ** (2..8) */ + u_char reserved;/* zero */ +}; Please make this a union if you really need to access the raw_value, or just drop it... Is this done to fit in the u_int t_tsomax that is in tcpcb? Also, I couldn't find code, but if the tcp connection needs to be sent out a different interface that has more restrictive tso requirements, do we properly handle this case? My quick reading of the code seems to imply that we only get the TSO requirements on connection and never update it... As these are per if, saving memory by packing them isn't really that effective these days... Per the later comments, yes, a shift MAY be faster than a full mul/div by a cycle or two, but this won't make that huge of a difference when dealing with it.. If the programmer has to use crazy macros or do the math EVERY time they use the fields, this will end up w/ less readable/maintainable code at the cost of improving performance by maybe .001%, so my vote is for u_int's instead, and convert to their sizes properly... Comments on the patch: You can drop the .reserve initalization... It is common C knowlege that unassigned members are assigned zero... The IF_TSO_LIMIT_CMP macros seems excesive... Do you ever see a need to use other operators? and if so, would they be useful? I'd just convert it to: #define IF_TSO_LIMIT_EQ(a, b) ((a)-raw_value[0] == (b)-raw_value[0]) I am a bit puzzled by this code: + /* Check if fragment limit will be exceeded */ + if (cur_frags = rem_frags) { + max_len += min(cur_length, rem_frags if_hw_tsomax.frag_size_log2); + break; + } specificly the max_len += line... The code seems to say if we would overrun the remaining frags (maybe you want a instead of =) we increase max_len... seems like if we include this frag that would put us over the limit that we should just skip it? (break w/o increasing max_len).. -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
How can sshuttle be used properly with FreeBSD (and with DNS) ?
I would like to use sshuttle (http://github.com/apenwarr/sshuttle) on FreeBSD. I have it working for TCP connections, but it does not properly tunnel DNS requests. The documentation for sshuttle says that ipfw forward rules will not properly forward UDP packets, and so when it runs on FreeBSD, sshuttle inserts divert rules instead. The project author believes that this will work properly (inserting divert rules to tunnel UDP) but I am not having any success. BUT, I already have a divert rule (and natd running) on this system even before I run sshuttle at all - because the system won't function as a normal gateway unless I use divert/natd. I prefer to run a gateway without divert/natd, but since both sides of this gateway are non-routable IPs, I cannot do that - in order to function as a gateway with 10.x.x.x networks on both sides, I need to run natd/divert. So that means that when sshuttle inserts its own divert rules, they conflict with the existing ones, and I am not running a second natd daemon, so I think it all just falls apart. How can this be fixed ? Is anyone out there using sshuttle on FreeBSD with the --dns switch ? Here is what my ipfw.conf looks like BEFORE I run sshuttle: add 1000 divert natd ip from any to any in via xl0 add 2000 divert natd ip from any to any out via xl0 and in rc.conf: gateway_enable=yes natd_enable=yes natd_interface=xl0 Again, this works fine - I have a functioning internet gateway and both of the interfaces on it have non-routable IP address. Then I run sshuttle and it *also* works fine - but only for TCP. It does not tunnel UDP (dns) properly like it is supposed to, and I think the reason is that I already have diverting/natd going on and then I run sshuttle and it inserts another two divert rules into ipfw. But I am not sure wha the fix would be ... Thanks. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
When to use and not use divert/natd ...
Hello, For many years I would build FreeBSD firewalls and they would be very, very simple - I just set gateway_enable=yes in rc.conf and everything just worked. However, these firewalls *always* had real, routable IPs no both sides. Both interfaces had real, routable IPs. Now I have a firewall that has two non-routable IPs for its interfaces, and is connected to a internet router with the real IP. When I try to builda very simple firewall it does not work, and I am forced to use ipdivert and natd. If I use ipdivert and natd, it works just fine. So, am I correct that I can create a simple gateway without natd/divert as long as both interfaces are real IPs, but if both interfaces are non-routable IPs, I am forced to use divert/natd ? Is that correct ? ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: ixgbe CRITICAL: ECC ERROR!! Please Reboot!!
On 05/09/2014 17:17, Marcelo Gondim wrote: On 05/09/2014 16:49, Adrian Chadd wrote: Hi, But is the airflow in the unit sufficient? I had this problem at a previous job - the box was running fine, the room was very cold, but the internal fans in the server were set to be very quiet. It wasn't enough to keep the ixgbe NICs happy. I had to change the fan settings to just always run full speed. The fan temperature feedback loop was based on sensors on the CPU, _not_ on the peripherals. Hi Adrian, Ummm. I'll check it and improve internal cooling. :) She is not happy and I'm also not. rsrsrsr Cheers, Besides the problem of heating of the network interface, I am putting some information here. Could you tell me if there is something strange or is it normal? dev.ix.0.%desc: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15 dev.ix.0.%driver: ix dev.ix.0.%location: slot=0 function=0 handle=\_SB_.PCI1.BR48.S3F0 dev.ix.0.%pnpinfo: vendor=0x8086 device=0x154d subvendor=0x8086 subdevice=0x7b11 class=0x02 dev.ix.0.%parent: pci131 dev.ix.0.fc: 3 dev.ix.0.enable_aim: 1 dev.ix.0.advertise_speed: 0 dev.ix.0.dropped: 0 dev.ix.0.mbuf_defrag_failed: 0 dev.ix.0.watchdog_events: 0 dev.ix.0.link_irq: 121769 dev.ix.0.queue0.interrupt_rate: 5319 dev.ix.0.queue0.irqs: 7900830877 dev.ix.0.queue0.txd_head: 1037 dev.ix.0.queue0.txd_tail: 1037 dev.ix.0.queue0.tso_tx: 142 dev.ix.0.queue0.no_tx_dma_setup: 0 dev.ix.0.queue0.no_desc_avail: 0 dev.ix.0.queue0.tx_packets: 9725701450 dev.ix.0.queue0.rxd_head: 1175 dev.ix.0.queue0.rxd_tail: 1174 dev.ix.0.queue0.rx_packets: 13069276955 dev.ix.0.queue0.rx_bytes: 3391061018 dev.ix.0.queue0.rx_copies: 8574407 dev.ix.0.queue0.lro_queued: 0 dev.ix.0.queue0.lro_flushed: 0 dev.ix.0.queue1.interrupt_rate: 41666 dev.ix.0.queue1.irqs: 7681141208 dev.ix.0.queue1.txd_head: 219 dev.ix.0.queue1.txd_tail: 221 dev.ix.0.queue1.tso_tx: 57 dev.ix.0.queue1.no_tx_dma_setup: 0 dev.ix.0.queue1.no_desc_avail: 44334 dev.ix.0.queue1.tx_packets: 10196891433 dev.ix.0.queue1.rxd_head: 1988 dev.ix.0.queue1.rxd_tail: 1987 dev.ix.0.queue1.rx_packets: 13210132242 dev.ix.0.queue1.rx_bytes: 4317357059 dev.ix.0.queue1.rx_copies: 8131936 dev.ix.0.queue1.lro_queued: 0 dev.ix.0.queue1.lro_flushed: 0 dev.ix.0.queue2.interrupt_rate: 5319 dev.ix.0.queue2.irqs: 7647486080 dev.ix.0.queue2.txd_head: 761 dev.ix.0.queue2.txd_tail: 761 dev.ix.0.queue2.tso_tx: 409 dev.ix.0.queue2.no_tx_dma_setup: 0 dev.ix.0.queue2.no_desc_avail: 54207 dev.ix.0.queue2.tx_packets: 10161246425 dev.ix.0.queue2.rxd_head: 1874 dev.ix.0.queue2.rxd_tail: 1872 dev.ix.0.queue2.rx_packets: 13175551880 dev.ix.0.queue2.rx_bytes: 4472798418 dev.ix.0.queue2.rx_copies: 7488876 dev.ix.0.queue2.lro_queued: 0 dev.ix.0.queue2.lro_flushed: 0 dev.ix.0.queue3.interrupt_rate: 50 dev.ix.0.queue3.irqs: 7641129521 dev.ix.0.queue3.txd_head: 2039 dev.ix.0.queue3.txd_tail: 2039 dev.ix.0.queue3.tso_tx: 9 dev.ix.0.queue3.no_tx_dma_setup: 0 dev.ix.0.queue3.no_desc_avail: 150346 dev.ix.0.queue3.tx_packets: 10619971896 dev.ix.0.queue3.rxd_head: 1055 dev.ix.0.queue3.rxd_tail: 1054 dev.ix.0.queue3.rx_packets: 13137835529 dev.ix.0.queue3.rx_bytes: 4063197306 dev.ix.0.queue3.rx_copies: 8188713 dev.ix.0.queue3.lro_queued: 0 dev.ix.0.queue3.lro_flushed: 0 dev.ix.0.queue4.interrupt_rate: 5319 dev.ix.0.queue4.irqs: 7439824996 dev.ix.0.queue4.txd_head: 26 dev.ix.0.queue4.txd_tail: 26 dev.ix.0.queue4.tso_tx: 553912 dev.ix.0.queue4.no_tx_dma_setup: 0 dev.ix.0.queue4.no_desc_avail: 0 dev.ix.0.queue4.tx_packets: 10658683718 dev.ix.0.queue4.rxd_head: 684 dev.ix.0.queue4.rxd_tail: 681 dev.ix.0.queue4.rx_packets: 13204786830 dev.ix.0.queue4.rx_bytes: 3700845239 dev.ix.0.queue4.rx_copies: 8193379 dev.ix.0.queue4.lro_queued: 0 dev.ix.0.queue4.lro_flushed: 0 dev.ix.0.queue5.interrupt_rate: 15151 dev.ix.0.queue5.irqs: 7456613396 dev.ix.0.queue5.txd_head: 603 dev.ix.0.queue5.txd_tail: 603 dev.ix.0.queue5.tso_tx: 17 dev.ix.0.queue5.no_tx_dma_setup: 0 dev.ix.0.queue5.no_desc_avail: 0 dev.ix.0.queue5.tx_packets: 10639139790 dev.ix.0.queue5.rxd_head: 404 dev.ix.0.queue5.rxd_tail: 403 dev.ix.0.queue5.rx_packets: 13144301293 dev.ix.0.queue5.rx_bytes: 3986784766 dev.ix.0.queue5.rx_copies: 8256195 dev.ix.0.queue5.lro_queued: 0 dev.ix.0.queue5.lro_flushed: 0 dev.ix.0.queue6.interrupt_rate: 125000 dev.ix.0.queue6.irqs: 7466940576 dev.ix.0.queue6.txd_head: 1784 dev.ix.0.queue6.txd_tail: 1784 dev.ix.0.queue6.tso_tx: 2001 dev.ix.0.queue6.no_tx_dma_setup: 0 dev.ix.0.queue6.no_desc_avail: 0 dev.ix.0.queue6.tx_packets: 9784312967 dev.ix.0.queue6.rxd_head: 395 dev.ix.0.queue6.rxd_tail: 394 dev.ix.0.queue6.rx_packets: 13103079970 dev.ix.0.queue6.rx_bytes: 3581485264 dev.ix.0.queue6.rx_copies: 7336569 dev.ix.0.queue6.lro_queued: 0 dev.ix.0.queue6.lro_flushed: 0 dev.ix.0.queue7.interrupt_rate: 5319 dev.ix.0.queue7.irqs: 7486391989 dev.ix.0.queue7.txd_head: 1549 dev.ix.0.queue7.txd_tail: 1549 dev.ix.0.queue7.tso_tx: 2052 dev.ix.0.queue7.no_tx_dma_setup: 0 dev.ix.0.queue7.no_desc_avail: 0