Re: em(4) ierrs [solved]

2010-09-22 Thread Andre Keller
Hi Stuart

On 21.09.2010 01:28, schrieb Stuart Henderson:
 I would try wbng first. Failing that, lm. I doubt you would
 need to disable ichiic but that would be the next step if there's
 no improvement. 

well disabling wbng seems to be the solution. After one day of normal
traffic levels we do not see any Ierrs anymore...

Thank you Stuart for the helpful advise.


Can somebody explain how this driver (which is for getting voltage
levels, fan speeds etc, if i did not misinterpret the manpage) is
causing this strange behavior? I'm just curious...


Thank you all


Regards Andre



Re: em(4) ierrs [solved]

2010-09-22 Thread Stuart Henderson
On 2010/09/22 17:38, Andre Keller wrote:
 Hi Stuart
 
 On 21.09.2010 01:28, schrieb Stuart Henderson:
  I would try wbng first. Failing that, lm. I doubt you would
  need to disable ichiic but that would be the next step if there's
  no improvement. 
 
 well disabling wbng seems to be the solution. After one day of normal
 traffic levels we do not see any Ierrs anymore...
 
 Thank you Stuart for the helpful advise.
 
 
 Can somebody explain how this driver (which is for getting voltage
 levels, fan speeds etc, if i did not misinterpret the manpage) is
 causing this strange behavior? I'm just curious...

Great, thanks for the feedback.

If any code ties up the kernel for too long, it can't handle
other tasks in a timely fashion. 



Re: em(4) ierrs [solved]

2010-09-22 Thread James Peltier
- Original Message 

 From: Stuart Henderson s...@spacehopper.org
 To: Andre Keller a...@list.ak.cx
 Cc: misc@openbsd.org
 Sent: Wed, September 22, 2010 8:44:26 AM
 Subject: Re: em(4) ierrs [solved]
 
 On 2010/09/22 17:38, Andre Keller wrote:
  Hi Stuart
  
  On  21.09.2010 01:28, schrieb Stuart Henderson:
   I would try wbng first.  Failing that, lm. I doubt you would
   need to disable ichiic but that  would be the next step if there's
   no improvement. 
  
   well disabling wbng seems to be the solution. After one day of normal
   traffic levels we do not see any Ierrs anymore...
  
  Thank you  Stuart for the helpful advise.
  
  
  Can somebody explain  how this driver (which is for getting voltage
  levels, fan speeds etc, if  i did not misinterpret the manpage) is
  causing this strange behavior?  I'm just curious...
 
 Great, thanks for the feedback.
 
 If any code  ties up the kernel for too long, it can't handle
 other tasks in a timely  fashion. 
 


I, unfortunately, am still experiencing livelocks on my em interfaces on my 
Dell 
R200 server in bridging mode.  I'm going to have to schedule an upgrade to the 
latest snapshot first to see if that clears up any issues, but barring that I'm 
not sure where to look.  Perhaps I'll also try the UP kernel.

---
James A. Peltier james_a_pelt...@yahoo.ca



Re: em(4) ierrs [solved]

2010-09-22 Thread Dave Del Debbio
I, unfortunately, am still experiencing livelocks on my em interfaces on my 
Dell 
R200 server in bridging mode.  I'm going to have to schedule an upgrade to the 
latest snapshot first to see if that clears up any issues, but barring that 
I'm 
not sure where to look.  Perhaps I'll also try the UP kernel.

http://marc.info/?l=openbsd-miscm=124082008204226w=4



Re: em(4) ierrs [solved]

2010-09-22 Thread Stuart Henderson
On 2010/09/22 10:04, James Peltier wrote:
 - Original Message 
 
  From: Stuart Henderson s...@spacehopper.org
  To: Andre Keller a...@list.ak.cx
  Cc: misc@openbsd.org
  Sent: Wed, September 22, 2010 8:44:26 AM
  Subject: Re: em(4) ierrs [solved]
  
  On 2010/09/22 17:38, Andre Keller wrote:
   Hi Stuart
   
   On  21.09.2010 01:28, schrieb Stuart Henderson:
I would try wbng first.  Failing that, lm. I doubt you would
need to disable ichiic but that  would be the next step if there's
no improvement. 
   
well disabling wbng seems to be the solution. After one day of normal
traffic levels we do not see any Ierrs anymore...
   
   Thank you  Stuart for the helpful advise.
   
   
   Can somebody explain  how this driver (which is for getting voltage
   levels, fan speeds etc, if  i did not misinterpret the manpage) is
   causing this strange behavior?  I'm just curious...
  
  Great, thanks for the feedback.
  
  If any code  ties up the kernel for too long, it can't handle
  other tasks in a timely  fashion. 
  
 
 
 I, unfortunately, am still experiencing livelocks on my em interfaces on my 
 Dell 
 R200 server in bridging mode.  I'm going to have to schedule an upgrade to 
 the 
 latest snapshot first to see if that clears up any issues, but barring that 
 I'm 
 not sure where to look.  Perhaps I'll also try the UP kernel.

the livelock counter means a timeout wasn't reached in time,
indicating the system being too busy to run userland.
(see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c,
and the video from asiabsdcon starting about 15 minutes into
http://www.youtube.com/watch?v=fv-AQJqUzRI).

when this happens, nics with drivers using the MCLGETI mechanism
halve the size of their receive rings, so that packets drop
earlier, more effectively limiting system load than if they
were allowed to proceed up the network stack.

so for some reason or other the timeout wasn't processed
quickly enough and the system responds in this way to limit
the overload. so the challenge is to identify what causes
the system to become non-responsive (could be in the network
stack or could be for other reasons) and work out ways
to alleviate that..



Re: em(4) ierrs [solved]

2010-09-22 Thread James Peltier
- Original Message 

 From: Stuart Henderson s...@spacehopper.org
 To: James Peltier james_a_pelt...@yahoo.ca
 Cc: Andre Keller a...@list.ak.cx; misc@openbsd.org
 Sent: Wed, September 22, 2010 12:31:43 PM
 Subject: Re: em(4) ierrs [solved]
snip
  I,  unfortunately, am still experiencing livelocks on my em interfaces on 
  my 
Dell 

  R200 server in bridging mode.  I'm going to have to schedule an  upgrade to 
the 

  latest snapshot first to see if that clears up any  issues, but barring 
  that 
I'm 

  not sure where to look.  Perhaps I'll  also try the UP kernel.
 
 the livelock counter means a timeout wasn't  reached in time,
 indicating the system being too busy to run  userland.
 (see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c,
 and the  video from asiabsdcon starting about 15 minutes into
 http://www.youtube.com/watch?v=fv-AQJqUzRI).
 
 when this happens, nics  with drivers using the MCLGETI mechanism
 halve the size of their receive  rings, so that packets drop
 earlier, more effectively limiting system load  than if they
 were allowed to proceed up the network stack.
 
 so for some  reason or other the timeout wasn't processed
 quickly enough and the system  responds in this way to limit
 the overload. so the challenge is to identify  what causes
 the system to become non-responsive (could be in the  network
 stack or could be for other reasons) and work out ways
 to  alleviate that..
 

Watching now. :)



Re: em(4) ierrs [solved]

2010-09-22 Thread James Peltier
 - Original Message 

 From: Stuart Henderson s...@spacehopper.org
 To: James Peltier james_a_pelt...@yahoo.ca
 Cc: Andre Keller a...@list.ak.cx; misc@openbsd.org
 Sent: Wed, September 22, 2010 12:31:43 PM
 Subject: Re: em(4) ierrs [solved]
 
 
 the livelock counter means a timeout wasn't  reached in time,
 indicating the system being too busy to run  userland.
 (see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c,
 and the  video from asiabsdcon starting about 15 minutes into
 http://www.youtube.com/watch?v=fv-AQJqUzRI).
 
 when this happens, nics  with drivers using the MCLGETI mechanism
 halve the size of their receive  rings, so that packets drop
 earlier, more effectively limiting system load  than if they
 were allowed to proceed up the network stack.
 
 so for some  reason or other the timeout wasn't processed
 quickly enough and the system  responds in this way to limit
 the overload. so the challenge is to identify  what causes
 the system to become non-responsive (could be in the  network
 stack or could be for other reasons) and work out ways
 to  alleviate that..
 


Thanks for the notes.  Below are snapshots of vmstat -i and systat vmstat which 
do show high interrupt levels (6-12k).  I put quotes around high because I'm 
not really sure if that is high.

That said, is there any benefit to the use of blocknonip clause being added to 
the bridge devices?

I also note, that according to the m_cldrop() that the halving is done on all 
interfaces.  This seems odd, in that, if you had a device with multiple cards 
that all traffic would be affected at the expense of one.  Am I correct in this?


# vmstat -i
interrupt   total rate
irq0/clock  819075628  199
irq0/ipi 208550295
irq112/em012478765512 3047
irq113/em113607027530 3322
irq113/bge1  126355323
irq97/uhci1  19490
irq96/ehci0220
irq98/pciide0 52040391
irq145/com0   3390
Total 26943565580 6578


and

#systat vmstat

   1 usersLoad 0.64 0.67 0.66  Wed Sep 22 16:56:35 2010

memory totals (in KB)PAGING   SWAPPING Interrupts
   real   virtual free   in  out   in  out11067 total
Active15388 15388  2918228   ops200 clock
All  383480383480  6585880   pages   48 ipi
   5586 em0
Proc:r  d  s  wCsw   Trp   Sys   Int   Sof  Flt 1 forks5212 em1
   7   101   561  1525  9438   105  595   fkppw  21 bge1
  fksvm uhci1
  18.8%Int   1.3%Sys   1.9%Usr   0.0%Nic  77.9%Idle   pwait ehci0
|||||||||||   relck pciide0
|=   rlkok com0
  noram
Namei Sys-cacheProc-cacheNo-cache  96 ndcpy
Calls hits%hits %miss   %  18 fltcp
   55   55  100   106 zfod
   31 cow
Disks   wd0   cd0   27514 fmin
seeks   36685 ftarg
xfers itarg
speed  17 wired
  sec pdfre
  pdscn
  pzidle
   13 kmapent


---
James A. Peltier james_a_pelt...@yahoo.ca8



Re: em(4) ierrs [solved]

2010-09-22 Thread Henning Brauer
* Stuart Henderson s...@spacehopper.org [2010-09-22 21:41]:
 the livelock counter means a timeout wasn't reached in time,
 indicating the system being too busy to run userland.
 (see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c,
 and the video from asiabsdcon starting about 15 minutes into
 http://www.youtube.com/watch?v=fv-AQJqUzRI).

and this, by itself, isn't necessarily a problem. you just see the rx
ring autosizing figuring out the right size.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting



Re: em(4) ierrs

2010-09-21 Thread Joerg Goltermann

On 20.09.2010 19:15, Andre Keller wrote:

Hi


I have some odd packet loss on a openbsd based router (running -current
as of the beginning of september) .

The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them
have traffic (about 10-20 Mbps).


which packet rate do you expect on the interfaces? Do you see
livelocks (systat -b mbuf)?

 - Joerg



Re: em(4) ierrs

2010-09-21 Thread James Peltier
- Original Message 
 From: Andre Keller a...@list.ak.cx
 To: misc@openbsd.org
 Cc: James Peltier james_a_pelt...@yahoo.ca
 Sent: Mon, September 20, 2010 3:51:16 PM
 Subject: Re: em(4) ierrs
 
 Am 20.09.2010 19:54, schrieb James Peltier:
  I see you are using LACP as  your trunk protocol.  You might want to check 
that 

  all the LACP  settings are correct or that there aren't any links being 
dropped 

  for  some reason that might cause the errors to occur.  Additionally, have 
you 

  tried with only one link in the LACP pairs being active?  Does it  stop 
then?
   
 
 Just tried that. There is not much I can  configure for LACP. On the
 switch I see no errors.
 
 I've now pulled one  cable so that only on interface in the trunk is
 active. The problem is still  existing. Ierrs on the interfaces (mostly
 em2) (btw. there are no  ifq.drops)
 It seems to me that some buffers are running full. As now when  there is
 low traffic there is only a small amount of errors (about 150 in  5minutes)
 
 Are there any other knobs I could try to  tune?
 
 
 Regards Andri


I would be tempted to say, back out all your changes and return to a stock 
configuration, except for the net.inet.ip.ifq.maxlen parameter.

I posted in early august that I was able to push nearly full gigabit speeds 
with 
a Dell R200 w/4GB of RAM with a pretty stock configuration.  Eventually I had 
to 
bump maxlen and the state table but that's about it.  I don't see these 
problems 
on an mid August snapshot.  I haven't had a chance to try the latest ones yet 
though.


 ---
James A. Peltier james_a_pelt...@yahoo.ca



Re: em(4) ierrs

2010-09-21 Thread Andre Keller
On 21.09.2010 09:21, schrieb Joerg Goltermann:
 On 20.09.2010 19:15, Andre Keller wrote:
 Hi


 I have some odd packet loss on a openbsd based router (running -current
 as of the beginning of september) .

 The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them
 have traffic (about 10-20 Mbps).

 which packet rate do you expect on the interfaces? Do you see
 livelocks (systat -b mbuf)? 

IFACE LIVELOCKS  SIZE ALIVE   LWM   HWM  
CWM  
System256  9893
805
   2k   287
985
lo0

em037652k   113 4   256  
113  
em1  432k12 4   256
4  
em293112k   135 4   256  
135  
em3 6702k12 4   256
4  
em4  432k 6 4   256 6 



Re: em(4) ierrs

2010-09-21 Thread Stuart Henderson
seriously,  please try disabling at least wbng, i think there is no
point looking at other things until you have tried that.



Re: em(4) ierrs

2010-09-21 Thread James Peltier
- Original Message 

 From: Joerg Goltermann go...@openbsd.org
 To: Andre Keller a...@list.ak.cx
 Cc: misc@openbsd.org
 Sent: Tue, September 21, 2010 12:21:28 AM
 Subject: Re: em(4) ierrs
 
 On 20.09.2010 19:15, Andre Keller wrote:
  Hi
 
 
  I  have some odd packet loss on a openbsd based router (running -current
  as  of the beginning of september) .
 
  The router has 6 physical  interfaces (all em, Intel 82575EB), 4 of them
  have traffic (about 10-20  Mbps).
 
 which packet rate do you expect on the interfaces? Do you  see
 livelocks (systat -b mbuf)?
 
   - Joerg


livelocks are seen on my em interfaces as well.  I also have livelocks on my 
far 
less busy bge1 management interface.  See below

IFACE LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
System256   116  84
   2k92 504
lo0
em0   293632k37 4   25637
em1   101742k37 4   25637
bge0
bge1  42k1717   51217
enc0
vlan300
bridge0
pflog0
pflow0


 ---
James A. Peltier james_a_pelt...@yahoo.ca



Re: em(4) ierrs

2010-09-21 Thread James Peltier
- Original Message 

 From: James Peltier james_a_pelt...@yahoo.ca
 To: misc@openbsd.org
 Cc: misc@openbsd.org
 Sent: Tue, September 21, 2010 9:46:40 AM
 Subject: Re: em(4) ierrs
 
 - Original Message 
 
  From: Joerg Goltermann go...@openbsd.org
  To: Andre  Keller a...@list.ak.cx
  Cc: misc@openbsd.org
  Sent: Tue, September  21, 2010 12:21:28 AM
  Subject: Re: em(4) ierrs
  
  On  20.09.2010 19:15, Andre Keller wrote:
   Hi
  
   
   I  have some odd packet loss on a openbsd based router  (running -current
   as  of the beginning of september)  .
  
   The router has 6 physical  interfaces (all em,  Intel 82575EB), 4 of them
   have traffic (about 10-20   Mbps).
  
  which packet rate do you expect on the interfaces? Do  you  see
  livelocks (systat -b mbuf)?
  
-  Joerg
 
 
 livelocks are seen on my em interfaces as well.  I also  have livelocks on my 
far 

 less busy bge1 management interface.  See  below
 
 IFACE LIVELOCKS   SIZE ALIVE   LWM   HWM   CWM
 System 256   116   84
 2k 92 504
 lo0
 em0293632k37  4   25637
 em1101742k37  4   25637
 bge0
 bge1   42k 1717   512 17
 enc0
 vlan300
 bridge0
 pflog0
 pflow0


I should mention that these might have been made prior to some recent tuning.  
However, for the purpose of following this thread I will keep an eye on it to 
be 
sure.



Re: em(4) ierrs

2010-09-21 Thread James Peltier
- Original Message 

 From: James Peltier james_a_pelt...@yahoo.ca
 To: misc@openbsd.org
 Sent: Tue, September 21, 2010 9:51:05 AM
 Subject: Re: em(4) ierrs
 
 - Original Message 
 
  From: James Peltier james_a_pelt...@yahoo.ca
   To: misc@openbsd.org
  Cc: misc@openbsd.org
  Sent: Tue, September  21, 2010 9:46:40 AM
  Subject: Re: em(4) ierrs
  
  -  Original Message 
  
   From: Joerg Goltermann go...@openbsd.org
   To:  Andre  Keller a...@list.ak.cx
   Cc: misc@openbsd.org
   Sent: Tue,  September  21, 2010 12:21:28 AM
   Subject: Re: em(4)  ierrs
   
   On  20.09.2010 19:15, Andre Keller  wrote:
Hi
   

 I  have some odd packet loss on a openbsd based router   (running 
-current
as  of the beginning of  september)  .
   
The router has 6  physical  interfaces (all em,  Intel 82575EB), 4 of 
them
 have traffic (about 10-20   Mbps).
   
   which  packet rate do you expect on the interfaces? Do  you  see
livelocks (systat -b mbuf)?
   
 -   Joerg
  
  
  livelocks are seen on my em interfaces as  well.  I also  have livelocks on 
my 

 far 
 
  less  busy bge1 management interface.  See  below
  
   IFACE LIVELOCKS   SIZE  ALIVE   LWM   HWM   CWM
  System  256   11684
   2k  92 504
  lo0
  em0 29363 2k37  4   25637
   em1 101742k37  4   256 37
  bge0
  bge142k 17 17   512 17
  enc0
  vlan300
   bridge0
  pflog0
  pflow0
 
 
 I should mention that these  might have been made prior to some recent 
 tuning.  

 However, for the  purpose of following this thread I will keep an eye on it 
 to 
be 

 sure.
 


I am in bridging mode and I too, am indeed seeing a slow increase in livelocks 
on my em0 interfaces.  Traffic has been quite low over the past week or so, so 
it certainly shouldn't be an issue.  The only modifications I have made thus 
far 
are to the net.inet.ip.ifq.maxlen bumped to 2048.  If you want any other info 
please let me know.


#sysctl -b mbuf
   1 usersLoad 0.13 0.09 0.08  Tue Sep 21 20:22:30 2010

IFACE LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
System25698  84
   2k74 504
lo0
em0   298912k29 4   25629
em1   103812k28 4   25628
bge0
bge1  42k1717   51217
enc0
vlan300
bridge0
pflog0
pflow0


# netstat -m
100 mbufs in use:
95 mbufs allocated to data
1 mbuf allocated to packet headers
4 mbufs allocated to socket names and addresses
74/1008/6144 mbuf 2048 byte clusters in use (current/peak/max)
0/8/6144 mbuf 4096 byte clusters in use (current/peak/max)
0/8/6144 mbuf 8192 byte clusters in use (current/peak/max)
0/8/6144 mbuf 9216 byte clusters in use (current/peak/max)
0/8/6144 mbuf 12288 byte clusters in use (current/peak/max)
0/8/6144 mbuf 16384 byte clusters in use (current/peak/max)
0/8/6144 mbuf 65536 byte clusters in use (current/peak/max)
2544 Kbytes allocated to network (6% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
#

 ---
James A. Peltier james_a_pelt...@yahoo.ca



Re: em(4) ierrs

2010-09-21 Thread Claudio Jeker
On Tue, Sep 21, 2010 at 08:31:16PM -0700, James Peltier wrote:
 I am in bridging mode and I too, am indeed seeing a slow increase in
 livelocks on my em0 interfaces.  Traffic has been quite low over the
 past week or so, so it certainly shouldn't be an issue.  The only
 modifications I have made thus far are to the net.inet.ip.ifq.maxlen
 bumped to 2048.  If you want any other info please let me know.
 

If you use bridge(4) net.inet.ip.ifq.maxlen will not change anything since
that queue is only used for incomming IP traffic. bridge(4) is stealing
the packets beforehands and has a own ifq.

-- 
:wq Claudio



Re: em(4) ierrs

2010-09-21 Thread patrick keshishian
On Tue, Sep 21, 2010 at 8:31 PM, James Peltier james_a_pelt...@yahoo.ca
wrote:
 - Original Message 

 From: James Peltier james_a_pelt...@yahoo.ca
 To: misc@openbsd.org
 Sent: Tue, September 21, 2010 9:51:05 AM
 Subject: Re: em(4) ierrs

 - Original Message 

  From: James Peltier james_a_pelt...@yahoo.ca
   To: misc@openbsd.org
  Cc: misc@openbsd.org
  Sent: Tue, September  21, 2010 9:46:40 AM
  Subject: Re: em(4) ierrs
 
  -  Original Message 
 
   From: Joerg Goltermann go...@openbsd.org
   To:  Andre  Keller a...@list.ak.cx
   Cc: misc@openbsd.org
   Sent: Tue,  September  21, 2010 12:21:28 AM
   Subject: Re: em(4)  ierrs
  
   On  20.09.2010 19:15, Andre Keller  wrote:
Hi
   

 I  have some odd packet loss on a openbsd based router   (running
-current
as  of the beginning of  september)  .
   
The router has 6  physical  interfaces (all em,  Intel 82575EB), 4
of
them
 have traffic (about 10-20   Mbps).
  
   which  packet rate do you expect on the interfaces? Do  you  see
livelocks (systat -b mbuf)?
  
 -   Joerg
 
 
  livelocks are seen on my em interfaces as  well.  I also  have livelocks
on
my

 far
 
  less  busy bge1 management interface.  See  below
 
   IFACE LIVELOCKS   SIZE  ALIVE   LWM   HWM   CWM
  System  256   11684
   2k  92 504
  lo0
  em0 29363 2k37  4   25637
   em1 101742k37  4   256 37
  bge0
  bge142k 17 17   512 17
  enc0
  vlan300
   bridge0
  pflog0
  pflow0


 I should mention that these  might have been made prior to some recent
tuning.

 However, for the  purpose of following this thread I will keep an eye on it
to
be

 sure.



 I am in bridging mode and I too, am indeed seeing a slow increase in
livelocks
 on my em0 interfaces.  Traffic has been quite low over the past week or so,
so
 it certainly shouldn't be an issue.  The only modifications I have made thus
far
 are to the net.inet.ip.ifq.maxlen bumped to 2048.  If you want any other
info
 please let me know.


 #sysctl -b mbuf

sure is a funny version of sysctl you are using there.


   1 usersLoad 0.13 0.09 0.08  Tue Sep 21 20:22:30
2010

 IFACE LIVELOCKS  SIZE ALIVE   LWM   HWM   CWM
 System25698  84
   2k74 504
 lo0
 em0   298912k29 4   25629
 em1   103812k28 4   25628
 bge0
 bge1  42k1717   51217
 enc0
 vlan300
 bridge0
 pflog0
 pflow0


 # netstat -m
 100 mbufs in use:
95 mbufs allocated to data
1 mbuf allocated to packet headers
4 mbufs allocated to socket names and addresses
 74/1008/6144 mbuf 2048 byte clusters in use (current/peak/max)
 0/8/6144 mbuf 4096 byte clusters in use (current/peak/max)
 0/8/6144 mbuf 8192 byte clusters in use (current/peak/max)
 0/8/6144 mbuf 9216 byte clusters in use (current/peak/max)
 0/8/6144 mbuf 12288 byte clusters in use (current/peak/max)
 0/8/6144 mbuf 16384 byte clusters in use (current/peak/max)
 0/8/6144 mbuf 65536 byte clusters in use (current/peak/max)
 2544 Kbytes allocated to network (6% in use)
 0 requests for memory denied
 0 requests for memory delayed
 0 calls to protocol drain routines
 #

  ---
 James A. Peltier james_a_pelt...@yahoo.ca



Re: em(4) ierrs

2010-09-20 Thread James Peltier
- Original Message 

 From: Andre Keller a...@list.ak.cx
 To: misc@openbsd.org
 Sent: Mon, September 20, 2010 10:15:58 AM
 Subject: em(4) ierrs
 
 Hi
 
 
 I have some odd packet loss on a openbsd based router (running  -current
 as of the beginning of september) .
 
 The router has 6  physical interfaces (all em, Intel 82575EB), 4 of them
 have traffic (about  10-20 Mbps).
 
 
 We did some tuning (mostly with informations from:
 https://calomel.org/network_performance.html) and could improve  the
 performance:
 
 Currently we use the following sysctl  tweaks:
 sysctl kern.maxclusters=122880
 sysctl  net.inet.ip.ifq.maxlen=1536
 sysctl net.inet.tcp.recvspace=262144
 sysctl  net.inet.tcp.sendspace=262144
 sysctl net.inet.udp.recvspace=262144
 sysctl  net.inet.udp.sendspace=262144
 
 
 But still we have about 1300 Ierrs per  minute...
 
 When we run a simple ping, we can see that something is  strange. Where
 the majority of packets have a rtt of 1ms or less about every  tenth
 package shows a rtt of 250ms...
 
 
 I could really use a  hint of what to try next (autoneg has been disabled
 on all interfaces for  testing, now it has been enabled again...)
 
 
 
 Thank you for your  inputs
 
 
 Andri Keller
 
 
 
 
 The switches on the other and  of the device are both cisco 2960G with a
 lacp to two interfaces on the  openbsd box:
 
 em0:  flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST
 mtu  1500
 lladdr 00:25:90:05:54:6c
  priority: 0
 trunk: trunkdev  trunk1
 media: Ethernet autoselect (1000baseT  full-duplex)
 status: active
  inet6 fe80::225:90ff:fe05:546c%em0 prefixlen 64 scopeid  0x1
 em1:  flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST
 mtu  1500
 lladdr 00:25:90:05:54:6c
  priority: 0
 trunk: trunkdev  trunk1
 media: Ethernet autoselect (1000baseT  full-duplex)
 status: active
  inet6 fe80::225:90ff:fe05:546d%em1 prefixlen 64 scopeid  0x2
 em2:  flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST
 mtu  1500
 lladdr 00:25:90:05:54:6e
  priority: 0
 trunk: trunkdev  trunk0
 media: Ethernet 1000baseT  full-duplex
 status: active
  inet6 fe80::225:90ff:fe05:546e%em2 prefixlen 64 scopeid  0x3
 em3:  flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST
 mtu  1500
 lladdr 00:25:90:05:54:6e
  priority: 0
 trunk: trunkdev  trunk0
 media: Ethernet autoselect (1000baseT  full-duplex)
 status: active
  inet6 fe80::225:90ff:fe05:546f%em3 prefixlen 64 scopeid  0x4
 
 trunk0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu  1500
 lladdr 00:25:90:05:54:6e
  priority: 0
 trunk: trunkproto  lacp
 trunk id:  [(8000,00:25:90:05:54:6e,4054,,),
   (8000,18:ef:63:bf:d7:00,0002,,)]
  trunkport em3  active,collecting,distributing
  trunkport em2 active,collecting,distributing
  groups: trunk
 media: Ethernet  autoselect
 status: active
  inet ADDRESS REMOVED
 inet6  fe80::225:90ff:fe05:546e%trunk0 prefixlen 64 scopeid 0xa
  inet6 ADDRESS REMOVED
 trunk1:  flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu  1500
 lladdr 00:25:90:05:54:6c
  priority: 0
 trunk: trunkproto  lacp
 trunk id:  [(8000,00:25:90:05:54:6c,405C,,),
   (8000,18:ef:63:bf:d7:00,0003,,)]
  trunkport em1  active,collecting,distributing
  trunkport em0 active,collecting,distributing
  groups: trunk
 media: Ethernet  autoselect
 status: active
  inet6 fe80::225:90ff:fe05:546c%trunk1 prefixlen 64 scopeid  0xb
 
 vlan56:  flags=8943UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST mtu  1500
 lladdr 00:25:90:05:54:6c
  priority: 0
 vlan: 56 priority: 0  parent interface: trunk1
 groups: vlan
  status: active
 inet6  fe80::225:90ff:fe05:546c%vlan56 prefixlen 64 scopeid 0x11
  inet ADDRESS REMOVED
 
 
  netstat
 -m
   
   

 
 9023 mbufs in use:
 9003 mbufs  allocated to data
 11 mbufs allocated to packet  headers
 9 mbufs allocated to socket names and  addresses
 528/1970/512000 mbuf 2048 byte clusters in use  (current/peak/max)
 0/8/512000 mbuf 4096 byte clusters in use  (current/peak/max)
 0/8/512000 mbuf 8192 byte clusters in use  (current/peak/max)
 0/8/512000 mbuf 9216 byte clusters in use  (current/peak/max)
 0/8/512000 mbuf 12288 byte clusters in use  (current/peak/max)
 0/8/512000 mbuf 16384 byte clusters in use  (current/peak/max)
 0/8/512000 mbuf 65536 byte clusters in use  (current/peak/max)
 7060 Kbytes allocated to network (46% in use)
 0  

Re: em(4) ierrs

2010-09-20 Thread Stuart Henderson
On 2010-09-20, Andre Keller a...@list.ak.cx wrote:

 I have some odd packet loss on a openbsd based router (running -current
 as of the beginning of september) .

 The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them
 have traffic (about 10-20 Mbps).


 We did some tuning (mostly with informations from:
 https://calomel.org/network_performance.html) and could improve the
 performance:

grr, that page again.

As a very general rule, using the on-board network card is going
to be much slower than an add in PCI card

A gigabit network controller built on board using the CPU will
slow the entire system down. More than likely the system will not
even be able to sustain 100MB speeds while also pegging the CPU at
100%.

and people still use it for kernel tuning advice?

 Currently we use the following sysctl tweaks:
 sysctl kern.maxclusters=122880

how much?!!

 sysctl net.inet.ip.ifq.maxlen=1536

increasing this from the defaults can be useful if you see drops in
net.inet.ip.ifq.drops, I'm surprised if you have to go that high for
4x10-20Mb.

 sysctl net.inet.tcp.recvspace=262144
 sysctl net.inet.tcp.sendspace=262144
 sysctl net.inet.udp.recvspace=262144
 sysctl net.inet.udp.sendspace=262144

the net.inet.*space values HAVE NO EFFECT on routed packets.

 But still we have about 1300 Ierrs per minute...

 When we run a simple ping, we can see that something is strange. Where
 the majority of packets have a rtt of 1ms or less about every tenth
 package shows a rtt of 250ms...

missing dmesg. but try disabling sensor devices or i2c controllers
(boot -c, disable somedevice, quit).



Re: em(4) ierrs

2010-09-20 Thread Andre Keller
Am 20.09.2010 19:54, schrieb James Peltier:
 I see you are using LACP as your trunk protocol.  You might want to check 
 that 
 all the LACP settings are correct or that there aren't any links being 
 dropped 
 for some reason that might cause the errors to occur.  Additionally, have you 
 tried with only one link in the LACP pairs being active?  Does it stop then?
   

Just tried that. There is not much I can configure for LACP. On the
switch I see no errors.

I've now pulled one cable so that only on interface in the trunk is
active. The problem is still existing. Ierrs on the interfaces (mostly
em2) (btw. there are no ifq.drops)
It seems to me that some buffers are running full. As now when there is
low traffic there is only a small amount of errors (about 150 in 5minutes)

Are there any other knobs I could try to tune?


Regards Andri



Re: em(4) ierrs

2010-09-20 Thread Andre Keller
Am 21.09.2010 00:43, schrieb Stuart Henderson:
 On 2010-09-20, Andre Keller a...@list.ak.cx wrote:
   
 I have some odd packet loss on a openbsd based router (running -current
 as of the beginning of september) .

 The router has 6 physical interfaces (all em, Intel 82575EB), 4 of them
 have traffic (about 10-20 Mbps).


 We did some tuning (mostly with informations from:
 https://calomel.org/network_performance.html) and could improve the
 performance:
 
 grr, that page again.

 As a very general rule, using the on-board network card is going
 to be much slower than an add in PCI card

 A gigabit network controller built on board using the CPU will
 slow the entire system down. More than likely the system will not
 even be able to sustain 100MB speeds while also pegging the CPU at
 100%.

 and people still use it for kernel tuning advice?
   

As we didn't find any other advices out there we thought it might be
worth giving it a try

   
 Currently we use the following sysctl tweaks:
 sysctl kern.maxclusters=122880
 
 how much?!!
   

yes this might be a bit to much:
[r...@rt01-rc: root]# netstat
-m 
9665 mbufs in use:
9642 mbufs allocated to data
14 mbufs allocated to packet headers
9 mbufs allocated to socket names and addresses
83/1970/122880 mbuf 2048 byte clusters in use (current/peak/max)
0/8/122880 mbuf 4096 byte clusters in use (current/peak/max)
0/8/122880 mbuf 8192 byte clusters in use (current/peak/max)
0/8/122880 mbuf 9216 byte clusters in use (current/peak/max)
0/8/122880 mbuf 12288 byte clusters in use (current/peak/max)
0/8/122880 mbuf 16384 byte clusters in use (current/peak/max)
0/8/122880 mbuf 65536 byte clusters in use (current/peak/max)
7288 Kbytes allocated to network (35% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines


 sysctl net.inet.ip.ifq.maxlen=1536
 
 increasing this from the defaults can be useful if you see drops in
 net.inet.ip.ifq.drops, I'm surprised if you have to go that high for
 4x10-20Mb.
   

yeah we had alot of ifq drops first and after setting this value they
are gone... I read on multiple tuning tutorial setting this to
256*iface count makes sense

 sysctl net.inet.tcp.recvspace=262144
 sysctl net.inet.tcp.sendspace=262144
 sysctl net.inet.udp.recvspace=262144
 sysctl net.inet.udp.sendspace=262144
 
 the net.inet.*space values HAVE NO EFFECT on routed packets.
   

OK good to know...

 But still we have about 1300 Ierrs per minute...

 When we run a simple ping, we can see that something is strange. Where
 the majority of packets have a rtt of 1ms or less about every tenth
 package shows a rtt of 250ms...
 
 missing dmesg.

Not from the machine above but a machine with the exactly same hardware...

OpenBSD 4.8 (GENERIC.MP) #3: Wed Aug 11 19:24:59 CEST 2010
r...@scaramanga.rbnetwork.biz:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 3486973952 (3325MB)
avail mem = 3380334592 (3223MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xcfedf000 (39 entries)
bios0: vendor Phoenix Technologies LTD version 1.3a date 11/03/2009
bios0: Supermicro X7SBi
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP _MAR MCFG APIC BOOT SPCR ERST HEST BERT EINJ
SLIC SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT
acpi0: wakeup devices PXHA(S5) PEX_(S5) LAN_(S5) USB4(S5) USB5(S5)
USB7(S5) ESB2(S5) EXP1(S5) EXP5(S5) EXP6(S5) USB1(S5) USB2(S5) USB3(S5)
USB6(S5) ESB1(S5) PCIB(S5) KBC0(S1) MSE0(S1) COM1(S5) COM2(S5) PWRB(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz, 2400.43 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,NXE,LONG
cpu0: 4MB 64b/line 16-way L2 cache
cpu0: apic clock running at 266MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz, 2400.09 MHz
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,NXE,LONG
cpu1: 4MB 64b/line 16-way L2 cache
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz, 2400.09 MHz
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,NXE,LONG
cpu2: 4MB 64b/line 16-way L2 cache
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz, 2400.09 MHz
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,SSE3,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,NXE,LONG
cpu3: 4MB 64b/line 16-way 

Re: em(4) ierrs

2010-09-20 Thread Stuart Henderson
On 2010/09/21 01:07, Andre Keller wrote:
 ichiic0 at pci0 dev 31 function 3 Intel 82801I SMBus rev 0x02: apic 4 int 
 17 (irq 10)
 iic0 at ichiic0
 lm1 at iic0 addr 0x2d: W83627HF
 wbng0 at iic0 addr 0x2f: w83793g

   but try disabling sensor devices or i2c controllers
  (boot -c, disable somedevice, quit).

 
 I'll try to find out what devices i could disable...

I would try wbng first. Failing that, lm. I doubt you would
need to disable ichiic but that would be the next step if there's
no improvement. You can make permanent changes to an on-disk
kernel with config(8).

 Thank you for your hints...

Please follow-up and let us know how it goes.



Re: em(4) ierrs

2010-09-20 Thread Henning Brauer
* Stuart Henderson s...@spacehopper.org [2010-09-21 00:47]:
 On 2010-09-20, Andre Keller a...@list.ak.cx wrote:
  We did some tuning (mostly with informations from:
  https://calomel.org/network_performance.html) and could improve the
  performance:
 
 grr, that page again.
 
 As a very general rule, using the on-board network card is going
 to be much slower than an add in PCI card
 
 A gigabit network controller built on board using the CPU will
 slow the entire system down. More than likely the system will not
 even be able to sustain 100MB speeds while also pegging the CPU at
 100%.
 
 and people still use it for kernel tuning advice?

holy shit.
that is indeed horribly wrong. in many cases it is the exact opposite
of the truth these days.

  sysctl net.inet.tcp.recvspace=262144
  sysctl net.inet.tcp.sendspace=262144
  sysctl net.inet.udp.recvspace=262144
  sysctl net.inet.udp.sendspace=262144
 the net.inet.*space values HAVE NO EFFECT on routed packets.

as said a gazillion times.

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting



Re: em(4) ierrs

2010-09-20 Thread Henning Brauer
* Andre Keller a...@list.ak.cx [2010-09-21 01:10]:
 As we didn't find any other advices out there we thought it might be
 worth giving it a try

ok, here's another advice that you migt wanna follow since you don't
find another:
to make your system run faster, donate all your belongings to openbsd,
then dance naked around the computer and eat nothing but rice all day.
after a few days throw the computer into the ocean. it'll be very fast
(to sink).

-- 
Henning Brauer, h...@bsws.de, henn...@openbsd.org
BS Web Services, http://bsws.de
Full-Service ISP - Secure Hosting, Mail and DNS Services
Dedicated Servers, Rootservers, Application Hosting