Re: em(4) ierrs [solved]
Hi Stuart On 21.09.2010 01:28, schrieb Stuart Henderson: I would try wbng first. Failing that, lm. I doubt you would need to disable ichiic but that would be the next step if there's no improvement. well disabling wbng seems to be the solution. After one day of normal traffic levels we do not see any Ierrs anymore... Thank you Stuart for the helpful advise. Can somebody explain how this driver (which is for getting voltage levels, fan speeds etc, if i did not misinterpret the manpage) is causing this strange behavior? I'm just curious... Thank you all Regards Andre
Re: em(4) ierrs [solved]
On 2010/09/22 17:38, Andre Keller wrote: Hi Stuart On 21.09.2010 01:28, schrieb Stuart Henderson: I would try wbng first. Failing that, lm. I doubt you would need to disable ichiic but that would be the next step if there's no improvement. well disabling wbng seems to be the solution. After one day of normal traffic levels we do not see any Ierrs anymore... Thank you Stuart for the helpful advise. Can somebody explain how this driver (which is for getting voltage levels, fan speeds etc, if i did not misinterpret the manpage) is causing this strange behavior? I'm just curious... Great, thanks for the feedback. If any code ties up the kernel for too long, it can't handle other tasks in a timely fashion.
Re: em(4) ierrs [solved]
- Original Message From: Stuart Henderson s...@spacehopper.org To: Andre Keller a...@list.ak.cx Cc: misc@openbsd.org Sent: Wed, September 22, 2010 8:44:26 AM Subject: Re: em(4) ierrs [solved] On 2010/09/22 17:38, Andre Keller wrote: Hi Stuart On 21.09.2010 01:28, schrieb Stuart Henderson: I would try wbng first. Failing that, lm. I doubt you would need to disable ichiic but that would be the next step if there's no improvement. well disabling wbng seems to be the solution. After one day of normal traffic levels we do not see any Ierrs anymore... Thank you Stuart for the helpful advise. Can somebody explain how this driver (which is for getting voltage levels, fan speeds etc, if i did not misinterpret the manpage) is causing this strange behavior? I'm just curious... Great, thanks for the feedback. If any code ties up the kernel for too long, it can't handle other tasks in a timely fashion. I, unfortunately, am still experiencing livelocks on my em interfaces on my Dell R200 server in bridging mode. I'm going to have to schedule an upgrade to the latest snapshot first to see if that clears up any issues, but barring that I'm not sure where to look. Perhaps I'll also try the UP kernel. --- James A. Peltier james_a_pelt...@yahoo.ca
Re: em(4) ierrs [solved]
I, unfortunately, am still experiencing livelocks on my em interfaces on my Dell R200 server in bridging mode. I'm going to have to schedule an upgrade to the latest snapshot first to see if that clears up any issues, but barring that I'm not sure where to look. Perhaps I'll also try the UP kernel. http://marc.info/?l=openbsd-miscm=124082008204226w=4
Re: em(4) ierrs [solved]
On 2010/09/22 10:04, James Peltier wrote: - Original Message From: Stuart Henderson s...@spacehopper.org To: Andre Keller a...@list.ak.cx Cc: misc@openbsd.org Sent: Wed, September 22, 2010 8:44:26 AM Subject: Re: em(4) ierrs [solved] On 2010/09/22 17:38, Andre Keller wrote: Hi Stuart On 21.09.2010 01:28, schrieb Stuart Henderson: I would try wbng first. Failing that, lm. I doubt you would need to disable ichiic but that would be the next step if there's no improvement. well disabling wbng seems to be the solution. After one day of normal traffic levels we do not see any Ierrs anymore... Thank you Stuart for the helpful advise. Can somebody explain how this driver (which is for getting voltage levels, fan speeds etc, if i did not misinterpret the manpage) is causing this strange behavior? I'm just curious... Great, thanks for the feedback. If any code ties up the kernel for too long, it can't handle other tasks in a timely fashion. I, unfortunately, am still experiencing livelocks on my em interfaces on my Dell R200 server in bridging mode. I'm going to have to schedule an upgrade to the latest snapshot first to see if that clears up any issues, but barring that I'm not sure where to look. Perhaps I'll also try the UP kernel. the livelock counter means a timeout wasn't reached in time, indicating the system being too busy to run userland. (see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c, and the video from asiabsdcon starting about 15 minutes into http://www.youtube.com/watch?v=fv-AQJqUzRI). when this happens, nics with drivers using the MCLGETI mechanism halve the size of their receive rings, so that packets drop earlier, more effectively limiting system load than if they were allowed to proceed up the network stack. so for some reason or other the timeout wasn't processed quickly enough and the system responds in this way to limit the overload. so the challenge is to identify what causes the system to become non-responsive (could be in the network stack or could be for other reasons) and work out ways to alleviate that..
Re: em(4) ierrs [solved]
- Original Message From: Stuart Henderson s...@spacehopper.org To: James Peltier james_a_pelt...@yahoo.ca Cc: Andre Keller a...@list.ak.cx; misc@openbsd.org Sent: Wed, September 22, 2010 12:31:43 PM Subject: Re: em(4) ierrs [solved] snip I, unfortunately, am still experiencing livelocks on my em interfaces on my Dell R200 server in bridging mode. I'm going to have to schedule an upgrade to the latest snapshot first to see if that clears up any issues, but barring that I'm not sure where to look. Perhaps I'll also try the UP kernel. the livelock counter means a timeout wasn't reached in time, indicating the system being too busy to run userland. (see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c, and the video from asiabsdcon starting about 15 minutes into http://www.youtube.com/watch?v=fv-AQJqUzRI). when this happens, nics with drivers using the MCLGETI mechanism halve the size of their receive rings, so that packets drop earlier, more effectively limiting system load than if they were allowed to proceed up the network stack. so for some reason or other the timeout wasn't processed quickly enough and the system responds in this way to limit the overload. so the challenge is to identify what causes the system to become non-responsive (could be in the network stack or could be for other reasons) and work out ways to alleviate that.. Watching now. :)
Re: em(4) ierrs [solved]
- Original Message From: Stuart Henderson s...@spacehopper.org To: James Peltier james_a_pelt...@yahoo.ca Cc: Andre Keller a...@list.ak.cx; misc@openbsd.org Sent: Wed, September 22, 2010 12:31:43 PM Subject: Re: em(4) ierrs [solved] the livelock counter means a timeout wasn't reached in time, indicating the system being too busy to run userland. (see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c, and the video from asiabsdcon starting about 15 minutes into http://www.youtube.com/watch?v=fv-AQJqUzRI). when this happens, nics with drivers using the MCLGETI mechanism halve the size of their receive rings, so that packets drop earlier, more effectively limiting system load than if they were allowed to proceed up the network stack. so for some reason or other the timeout wasn't processed quickly enough and the system responds in this way to limit the overload. so the challenge is to identify what causes the system to become non-responsive (could be in the network stack or could be for other reasons) and work out ways to alleviate that.. Thanks for the notes. Below are snapshots of vmstat -i and systat vmstat which do show high interrupt levels (6-12k). I put quotes around high because I'm not really sure if that is high. That said, is there any benefit to the use of blocknonip clause being added to the bridge devices? I also note, that according to the m_cldrop() that the halving is done on all interfaces. This seems odd, in that, if you had a device with multiple cards that all traffic would be affected at the expense of one. Am I correct in this? # vmstat -i interrupt total rate irq0/clock 819075628 199 irq0/ipi 208550295 irq112/em012478765512 3047 irq113/em113607027530 3322 irq113/bge1 126355323 irq97/uhci1 19490 irq96/ehci0220 irq98/pciide0 52040391 irq145/com0 3390 Total 26943565580 6578 and #systat vmstat 1 usersLoad 0.64 0.67 0.66 Wed Sep 22 16:56:35 2010 memory totals (in KB)PAGING SWAPPING Interrupts real virtual free in out in out11067 total Active15388 15388 2918228 ops200 clock All 383480383480 6585880 pages 48 ipi 5586 em0 Proc:r d s wCsw Trp Sys Int Sof Flt 1 forks5212 em1 7 101 561 1525 9438 105 595 fkppw 21 bge1 fksvm uhci1 18.8%Int 1.3%Sys 1.9%Usr 0.0%Nic 77.9%Idle pwait ehci0 ||||||||||| relck pciide0 |= rlkok com0 noram Namei Sys-cacheProc-cacheNo-cache 96 ndcpy Calls hits%hits %miss % 18 fltcp 55 55 100 106 zfod 31 cow Disks wd0 cd0 27514 fmin seeks 36685 ftarg xfers itarg speed 17 wired sec pdfre pdscn pzidle 13 kmapent --- James A. Peltier james_a_pelt...@yahoo.ca8
Re: em(4) ierrs [solved]
* Stuart Henderson s...@spacehopper.org [2010-09-22 21:41]: the livelock counter means a timeout wasn't reached in time, indicating the system being too busy to run userland. (see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c, and the video from asiabsdcon starting about 15 minutes into http://www.youtube.com/watch?v=fv-AQJqUzRI). and this, by itself, isn't necessarily a problem. you just see the rx ring autosizing figuring out the right size. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Re: em(4) ierrs
On 20.09.2010 19:15, Andre Keller wrote: Hi I have some odd packet loss on a openbsd based router (running -current as of the beginning of september) . The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them have traffic (about 10-20 Mbps). which packet rate do you expect on the interfaces? Do you see livelocks (systat -b mbuf)? - Joerg
Re: em(4) ierrs
- Original Message From: Andre Keller a...@list.ak.cx To: misc@openbsd.org Cc: James Peltier james_a_pelt...@yahoo.ca Sent: Mon, September 20, 2010 3:51:16 PM Subject: Re: em(4) ierrs Am 20.09.2010 19:54, schrieb James Peltier: I see you are using LACP as your trunk protocol. You might want to check that all the LACP settings are correct or that there aren't any links being dropped for some reason that might cause the errors to occur. Additionally, have you tried with only one link in the LACP pairs being active? Does it stop then? Just tried that. There is not much I can configure for LACP. On the switch I see no errors. I've now pulled one cable so that only on interface in the trunk is active. The problem is still existing. Ierrs on the interfaces (mostly em2) (btw. there are no ifq.drops) It seems to me that some buffers are running full. As now when there is low traffic there is only a small amount of errors (about 150 in 5minutes) Are there any other knobs I could try to tune? Regards Andri I would be tempted to say, back out all your changes and return to a stock configuration, except for the net.inet.ip.ifq.maxlen parameter. I posted in early august that I was able to push nearly full gigabit speeds with a Dell R200 w/4GB of RAM with a pretty stock configuration. Eventually I had to bump maxlen and the state table but that's about it. I don't see these problems on an mid August snapshot. I haven't had a chance to try the latest ones yet though. --- James A. Peltier james_a_pelt...@yahoo.ca
Re: em(4) ierrs
On 21.09.2010 09:21, schrieb Joerg Goltermann: On 20.09.2010 19:15, Andre Keller wrote: Hi I have some odd packet loss on a openbsd based router (running -current as of the beginning of september) . The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them have traffic (about 10-20 Mbps). which packet rate do you expect on the interfaces? Do you see livelocks (systat -b mbuf)? IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System256 9893 805 2k 287 985 lo0 em037652k 113 4 256 113 em1 432k12 4 256 4 em293112k 135 4 256 135 em3 6702k12 4 256 4 em4 432k 6 4 256 6
Re: em(4) ierrs
seriously, please try disabling at least wbng, i think there is no point looking at other things until you have tried that.
Re: em(4) ierrs
- Original Message From: Joerg Goltermann go...@openbsd.org To: Andre Keller a...@list.ak.cx Cc: misc@openbsd.org Sent: Tue, September 21, 2010 12:21:28 AM Subject: Re: em(4) ierrs On 20.09.2010 19:15, Andre Keller wrote: Hi I have some odd packet loss on a openbsd based router (running -current as of the beginning of september) . The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them have traffic (about 10-20 Mbps). which packet rate do you expect on the interfaces? Do you see livelocks (systat -b mbuf)? - Joerg livelocks are seen on my em interfaces as well. I also have livelocks on my far less busy bge1 management interface. See below IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System256 116 84 2k92 504 lo0 em0 293632k37 4 25637 em1 101742k37 4 25637 bge0 bge1 42k1717 51217 enc0 vlan300 bridge0 pflog0 pflow0 --- James A. Peltier james_a_pelt...@yahoo.ca
Re: em(4) ierrs
- Original Message From: James Peltier james_a_pelt...@yahoo.ca To: misc@openbsd.org Cc: misc@openbsd.org Sent: Tue, September 21, 2010 9:46:40 AM Subject: Re: em(4) ierrs - Original Message From: Joerg Goltermann go...@openbsd.org To: Andre Keller a...@list.ak.cx Cc: misc@openbsd.org Sent: Tue, September 21, 2010 12:21:28 AM Subject: Re: em(4) ierrs On 20.09.2010 19:15, Andre Keller wrote: Hi I have some odd packet loss on a openbsd based router (running -current as of the beginning of september) . The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them have traffic (about 10-20 Mbps). which packet rate do you expect on the interfaces? Do you see livelocks (systat -b mbuf)? - Joerg livelocks are seen on my em interfaces as well. I also have livelocks on my far less busy bge1 management interface. See below IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System 256 116 84 2k 92 504 lo0 em0293632k37 4 25637 em1101742k37 4 25637 bge0 bge1 42k 1717 512 17 enc0 vlan300 bridge0 pflog0 pflow0 I should mention that these might have been made prior to some recent tuning. However, for the purpose of following this thread I will keep an eye on it to be sure.
Re: em(4) ierrs
- Original Message From: James Peltier james_a_pelt...@yahoo.ca To: misc@openbsd.org Sent: Tue, September 21, 2010 9:51:05 AM Subject: Re: em(4) ierrs - Original Message From: James Peltier james_a_pelt...@yahoo.ca To: misc@openbsd.org Cc: misc@openbsd.org Sent: Tue, September 21, 2010 9:46:40 AM Subject: Re: em(4) ierrs - Original Message From: Joerg Goltermann go...@openbsd.org To: Andre Keller a...@list.ak.cx Cc: misc@openbsd.org Sent: Tue, September 21, 2010 12:21:28 AM Subject: Re: em(4) ierrs On 20.09.2010 19:15, Andre Keller wrote: Hi I have some odd packet loss on a openbsd based router (running -current as of the beginning of september) . The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them have traffic (about 10-20 Mbps). which packet rate do you expect on the interfaces? Do you see livelocks (systat -b mbuf)? - Joerg livelocks are seen on my em interfaces as well. I also have livelocks on my far less busy bge1 management interface. See below IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System 256 11684 2k 92 504 lo0 em0 29363 2k37 4 25637 em1 101742k37 4 256 37 bge0 bge142k 17 17 512 17 enc0 vlan300 bridge0 pflog0 pflow0 I should mention that these might have been made prior to some recent tuning. However, for the purpose of following this thread I will keep an eye on it to be sure. I am in bridging mode and I too, am indeed seeing a slow increase in livelocks on my em0 interfaces. Traffic has been quite low over the past week or so, so it certainly shouldn't be an issue. The only modifications I have made thus far are to the net.inet.ip.ifq.maxlen bumped to 2048. If you want any other info please let me know. #sysctl -b mbuf 1 usersLoad 0.13 0.09 0.08 Tue Sep 21 20:22:30 2010 IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System25698 84 2k74 504 lo0 em0 298912k29 4 25629 em1 103812k28 4 25628 bge0 bge1 42k1717 51217 enc0 vlan300 bridge0 pflog0 pflow0 # netstat -m 100 mbufs in use: 95 mbufs allocated to data 1 mbuf allocated to packet headers 4 mbufs allocated to socket names and addresses 74/1008/6144 mbuf 2048 byte clusters in use (current/peak/max) 0/8/6144 mbuf 4096 byte clusters in use (current/peak/max) 0/8/6144 mbuf 8192 byte clusters in use (current/peak/max) 0/8/6144 mbuf 9216 byte clusters in use (current/peak/max) 0/8/6144 mbuf 12288 byte clusters in use (current/peak/max) 0/8/6144 mbuf 16384 byte clusters in use (current/peak/max) 0/8/6144 mbuf 65536 byte clusters in use (current/peak/max) 2544 Kbytes allocated to network (6% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines # --- James A. Peltier james_a_pelt...@yahoo.ca
Re: em(4) ierrs
On Tue, Sep 21, 2010 at 08:31:16PM -0700, James Peltier wrote: I am in bridging mode and I too, am indeed seeing a slow increase in livelocks on my em0 interfaces. Traffic has been quite low over the past week or so, so it certainly shouldn't be an issue. The only modifications I have made thus far are to the net.inet.ip.ifq.maxlen bumped to 2048. If you want any other info please let me know. If you use bridge(4) net.inet.ip.ifq.maxlen will not change anything since that queue is only used for incomming IP traffic. bridge(4) is stealing the packets beforehands and has a own ifq. -- :wq Claudio
Re: em(4) ierrs
On Tue, Sep 21, 2010 at 8:31 PM, James Peltier james_a_pelt...@yahoo.ca wrote: - Original Message From: James Peltier james_a_pelt...@yahoo.ca To: misc@openbsd.org Sent: Tue, September 21, 2010 9:51:05 AM Subject: Re: em(4) ierrs - Original Message From: James Peltier james_a_pelt...@yahoo.ca To: misc@openbsd.org Cc: misc@openbsd.org Sent: Tue, September 21, 2010 9:46:40 AM Subject: Re: em(4) ierrs - Original Message From: Joerg Goltermann go...@openbsd.org To: Andre Keller a...@list.ak.cx Cc: misc@openbsd.org Sent: Tue, September 21, 2010 12:21:28 AM Subject: Re: em(4) ierrs On 20.09.2010 19:15, Andre Keller wrote: Hi I have some odd packet loss on a openbsd based router (running -current as of the beginning of september) . The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them have traffic (about 10-20 Mbps). which packet rate do you expect on the interfaces? Do you see livelocks (systat -b mbuf)? - Joerg livelocks are seen on my em interfaces as well. I also have livelocks on my far less busy bge1 management interface. See below IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System 256 11684 2k 92 504 lo0 em0 29363 2k37 4 25637 em1 101742k37 4 256 37 bge0 bge142k 17 17 512 17 enc0 vlan300 bridge0 pflog0 pflow0 I should mention that these might have been made prior to some recent tuning. However, for the purpose of following this thread I will keep an eye on it to be sure. I am in bridging mode and I too, am indeed seeing a slow increase in livelocks on my em0 interfaces. Traffic has been quite low over the past week or so, so it certainly shouldn't be an issue. The only modifications I have made thus far are to the net.inet.ip.ifq.maxlen bumped to 2048. If you want any other info please let me know. #sysctl -b mbuf sure is a funny version of sysctl you are using there. 1 usersLoad 0.13 0.09 0.08 Tue Sep 21 20:22:30 2010 IFACE LIVELOCKS SIZE ALIVE LWM HWM CWM System25698 84 2k74 504 lo0 em0 298912k29 4 25629 em1 103812k28 4 25628 bge0 bge1 42k1717 51217 enc0 vlan300 bridge0 pflog0 pflow0 # netstat -m 100 mbufs in use: 95 mbufs allocated to data 1 mbuf allocated to packet headers 4 mbufs allocated to socket names and addresses 74/1008/6144 mbuf 2048 byte clusters in use (current/peak/max) 0/8/6144 mbuf 4096 byte clusters in use (current/peak/max) 0/8/6144 mbuf 8192 byte clusters in use (current/peak/max) 0/8/6144 mbuf 9216 byte clusters in use (current/peak/max) 0/8/6144 mbuf 12288 byte clusters in use (current/peak/max) 0/8/6144 mbuf 16384 byte clusters in use (current/peak/max) 0/8/6144 mbuf 65536 byte clusters in use (current/peak/max) 2544 Kbytes allocated to network (6% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines # --- James A. Peltier james_a_pelt...@yahoo.ca
Re: em(4) ierrs
- Original Message From: Andre Keller a...@list.ak.cx To: misc@openbsd.org Sent: Mon, September 20, 2010 10:15:58 AM Subject: em(4) ierrs Hi I have some odd packet loss on a openbsd based router (running -current as of the beginning of september) . The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them have traffic (about 10-20 Mbps). We did some tuning (mostly with informations from: https://calomel.org/network_performance.html) and could improve the performance: Currently we use the following sysctl tweaks: sysctl kern.maxclusters=122880 sysctl net.inet.ip.ifq.maxlen=1536 sysctl net.inet.tcp.recvspace=262144 sysctl net.inet.tcp.sendspace=262144 sysctl net.inet.udp.recvspace=262144 sysctl net.inet.udp.sendspace=262144 But still we have about 1300 Ierrs per minute... When we run a simple ping, we can see that something is strange. Where the majority of packets have a rtt of 1ms or less about every tenth package shows a rtt of 250ms... I could really use a hint of what to try next (autoneg has been disabled on all interfaces for testing, now it has been enabled again...) Thank you for your inputs Andri Keller The switches on the other and of the device are both cisco 2960G with a lacp to two interfaces on the openbsd box: em0: flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST mtu 1500 lladdr 00:25:90:05:54:6c priority: 0 trunk: trunkdev trunk1 media: Ethernet autoselect (1000baseT full-duplex) status: active inet6 fe80::225:90ff:fe05:546c%em0 prefixlen 64 scopeid 0x1 em1: flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST mtu 1500 lladdr 00:25:90:05:54:6c priority: 0 trunk: trunkdev trunk1 media: Ethernet autoselect (1000baseT full-duplex) status: active inet6 fe80::225:90ff:fe05:546d%em1 prefixlen 64 scopeid 0x2 em2: flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST mtu 1500 lladdr 00:25:90:05:54:6e priority: 0 trunk: trunkdev trunk0 media: Ethernet 1000baseT full-duplex status: active inet6 fe80::225:90ff:fe05:546e%em2 prefixlen 64 scopeid 0x3 em3: flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST mtu 1500 lladdr 00:25:90:05:54:6e priority: 0 trunk: trunkdev trunk0 media: Ethernet autoselect (1000baseT full-duplex) status: active inet6 fe80::225:90ff:fe05:546f%em3 prefixlen 64 scopeid 0x4 trunk0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 00:25:90:05:54:6e priority: 0 trunk: trunkproto lacp trunk id: [(8000,00:25:90:05:54:6e,4054,,), (8000,18:ef:63:bf:d7:00,0002,,)] trunkport em3 active,collecting,distributing trunkport em2 active,collecting,distributing groups: trunk media: Ethernet autoselect status: active inet ADDRESS REMOVED inet6 fe80::225:90ff:fe05:546e%trunk0 prefixlen 64 scopeid 0xa inet6 ADDRESS REMOVED trunk1: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 lladdr 00:25:90:05:54:6c priority: 0 trunk: trunkproto lacp trunk id: [(8000,00:25:90:05:54:6c,405C,,), (8000,18:ef:63:bf:d7:00,0003,,)] trunkport em1 active,collecting,distributing trunkport em0 active,collecting,distributing groups: trunk media: Ethernet autoselect status: active inet6 fe80::225:90ff:fe05:546c%trunk1 prefixlen 64 scopeid 0xb vlan56: flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu 1500 lladdr 00:25:90:05:54:6c priority: 0 vlan: 56 priority: 0 parent interface: trunk1 groups: vlan status: active inet6 fe80::225:90ff:fe05:546c%vlan56 prefixlen 64 scopeid 0x11 inet ADDRESS REMOVED netstat -m 9023 mbufs in use: 9003 mbufs allocated to data 11 mbufs allocated to packet headers 9 mbufs allocated to socket names and addresses 528/1970/512000 mbuf 2048 byte clusters in use (current/peak/max) 0/8/512000 mbuf 4096 byte clusters in use (current/peak/max) 0/8/512000 mbuf 8192 byte clusters in use (current/peak/max) 0/8/512000 mbuf 9216 byte clusters in use (current/peak/max) 0/8/512000 mbuf 12288 byte clusters in use (current/peak/max) 0/8/512000 mbuf 16384 byte clusters in use (current/peak/max) 0/8/512000 mbuf 65536 byte clusters in use (current/peak/max) 7060 Kbytes allocated to network (46% in use) 0
Re: em(4) ierrs
On 2010-09-20, Andre Keller a...@list.ak.cx wrote: I have some odd packet loss on a openbsd based router (running -current as of the beginning of september) . The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them have traffic (about 10-20 Mbps). We did some tuning (mostly with informations from: https://calomel.org/network_performance.html) and could improve the performance: grr, that page again. As a very general rule, using the on-board network card is going to be much slower than an add in PCI card A gigabit network controller built on board using the CPU will slow the entire system down. More than likely the system will not even be able to sustain 100MB speeds while also pegging the CPU at 100%. and people still use it for kernel tuning advice? Currently we use the following sysctl tweaks: sysctl kern.maxclusters=122880 how much?!! sysctl net.inet.ip.ifq.maxlen=1536 increasing this from the defaults can be useful if you see drops in net.inet.ip.ifq.drops, I'm surprised if you have to go that high for 4x10-20Mb. sysctl net.inet.tcp.recvspace=262144 sysctl net.inet.tcp.sendspace=262144 sysctl net.inet.udp.recvspace=262144 sysctl net.inet.udp.sendspace=262144 the net.inet.*space values HAVE NO EFFECT on routed packets. But still we have about 1300 Ierrs per minute... When we run a simple ping, we can see that something is strange. Where the majority of packets have a rtt of 1ms or less about every tenth package shows a rtt of 250ms... missing dmesg. but try disabling sensor devices or i2c controllers (boot -c, disable somedevice, quit).
Re: em(4) ierrs
Am 20.09.2010 19:54, schrieb James Peltier: I see you are using LACP as your trunk protocol. You might want to check that all the LACP settings are correct or that there aren't any links being dropped for some reason that might cause the errors to occur. Additionally, have you tried with only one link in the LACP pairs being active? Does it stop then? Just tried that. There is not much I can configure for LACP. On the switch I see no errors. I've now pulled one cable so that only on interface in the trunk is active. The problem is still existing. Ierrs on the interfaces (mostly em2) (btw. there are no ifq.drops) It seems to me that some buffers are running full. As now when there is low traffic there is only a small amount of errors (about 150 in 5minutes) Are there any other knobs I could try to tune? Regards Andri
Re: em(4) ierrs
Am 21.09.2010 00:43, schrieb Stuart Henderson: On 2010-09-20, Andre Keller a...@list.ak.cx wrote: I have some odd packet loss on a openbsd based router (running -current as of the beginning of september) . The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them have traffic (about 10-20 Mbps). We did some tuning (mostly with informations from: https://calomel.org/network_performance.html) and could improve the performance: grr, that page again. As a very general rule, using the on-board network card is going to be much slower than an add in PCI card A gigabit network controller built on board using the CPU will slow the entire system down. More than likely the system will not even be able to sustain 100MB speeds while also pegging the CPU at 100%. and people still use it for kernel tuning advice? As we didn't find any other advices out there we thought it might be worth giving it a try Currently we use the following sysctl tweaks: sysctl kern.maxclusters=122880 how much?!! yes this might be a bit to much: [r...@rt01-rc: root]# netstat -m 9665 mbufs in use: 9642 mbufs allocated to data 14 mbufs allocated to packet headers 9 mbufs allocated to socket names and addresses 83/1970/122880 mbuf 2048 byte clusters in use (current/peak/max) 0/8/122880 mbuf 4096 byte clusters in use (current/peak/max) 0/8/122880 mbuf 8192 byte clusters in use (current/peak/max) 0/8/122880 mbuf 9216 byte clusters in use (current/peak/max) 0/8/122880 mbuf 12288 byte clusters in use (current/peak/max) 0/8/122880 mbuf 16384 byte clusters in use (current/peak/max) 0/8/122880 mbuf 65536 byte clusters in use (current/peak/max) 7288 Kbytes allocated to network (35% in use) 0 requests for memory denied 0 requests for memory delayed 0 calls to protocol drain routines sysctl net.inet.ip.ifq.maxlen=1536 increasing this from the defaults can be useful if you see drops in net.inet.ip.ifq.drops, I'm surprised if you have to go that high for 4x10-20Mb. yeah we had alot of ifq drops first and after setting this value they are gone... I read on multiple tuning tutorial setting this to 256*iface count makes sense sysctl net.inet.tcp.recvspace=262144 sysctl net.inet.tcp.sendspace=262144 sysctl net.inet.udp.recvspace=262144 sysctl net.inet.udp.sendspace=262144 the net.inet.*space values HAVE NO EFFECT on routed packets. OK good to know... But still we have about 1300 Ierrs per minute... When we run a simple ping, we can see that something is strange. Where the majority of packets have a rtt of 1ms or less about every tenth package shows a rtt of 250ms... missing dmesg. Not from the machine above but a machine with the exactly same hardware... OpenBSD 4.8 (GENERIC.MP) #3: Wed Aug 11 19:24:59 CEST 2010 r...@scaramanga.rbnetwork.biz:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 3486973952 (3325MB) avail mem = 3380334592 (3223MB) mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcfedf000 (39 entries) bios0: vendor Phoenix Technologies LTD version 1.3a date 11/03/2009 bios0: Supermicro X7SBi acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP _MAR MCFG APIC BOOT SPCR ERST HEST BERT EINJ SLIC SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT acpi0: wakeup devices PXHA(S5) PEX_(S5) LAN_(S5) USB4(S5) USB5(S5) USB7(S5) ESB2(S5) EXP1(S5) EXP5(S5) EXP6(S5) USB1(S5) USB2(S5) USB3(S5) USB6(S5) ESB1(S5) PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5) PWRB(S3) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz, 2400.43 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,NXE,LONG cpu0: 4MB 64b/line 16-way L2 cache cpu0: apic clock running at 266MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz, 2400.09 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,NXE,LONG cpu1: 4MB 64b/line 16-way L2 cache cpu2 at mainbus0: apid 2 (application processor) cpu2: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz, 2400.09 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,NXE,LONG cpu2: 4MB 64b/line 16-way L2 cache cpu3 at mainbus0: apid 3 (application processor) cpu3: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz, 2400.09 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,NXE,LONG cpu3: 4MB 64b/line 16-way
Re: em(4) ierrs
On 2010/09/21 01:07, Andre Keller wrote: ichiic0 at pci0 dev 31 function 3 Intel 82801I SMBus rev 0x02: apic 4 int 17 (irq 10) iic0 at ichiic0 lm1 at iic0 addr 0x2d: W83627HF wbng0 at iic0 addr 0x2f: w83793g but try disabling sensor devices or i2c controllers (boot -c, disable somedevice, quit). I'll try to find out what devices i could disable... I would try wbng first. Failing that, lm. I doubt you would need to disable ichiic but that would be the next step if there's no improvement. You can make permanent changes to an on-disk kernel with config(8). Thank you for your hints... Please follow-up and let us know how it goes.
Re: em(4) ierrs
* Stuart Henderson s...@spacehopper.org [2010-09-21 00:47]: On 2010-09-20, Andre Keller a...@list.ak.cx wrote: We did some tuning (mostly with informations from: https://calomel.org/network_performance.html) and could improve the performance: grr, that page again. As a very general rule, using the on-board network card is going to be much slower than an add in PCI card A gigabit network controller built on board using the CPU will slow the entire system down. More than likely the system will not even be able to sustain 100MB speeds while also pegging the CPU at 100%. and people still use it for kernel tuning advice? holy shit. that is indeed horribly wrong. in many cases it is the exact opposite of the truth these days. sysctl net.inet.tcp.recvspace=262144 sysctl net.inet.tcp.sendspace=262144 sysctl net.inet.udp.recvspace=262144 sysctl net.inet.udp.sendspace=262144 the net.inet.*space values HAVE NO EFFECT on routed packets. as said a gazillion times. -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting
Re: em(4) ierrs
* Andre Keller a...@list.ak.cx [2010-09-21 01:10]: As we didn't find any other advices out there we thought it might be worth giving it a try ok, here's another advice that you migt wanna follow since you don't find another: to make your system run faster, donate all your belongings to openbsd, then dance naked around the computer and eat nothing but rice all day. after a few days throw the computer into the ocean. it'll be very fast (to sink). -- Henning Brauer, h...@bsws.de, henn...@openbsd.org BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting