Re: 3 show-stopper issues with 9-BETA3

2011-10-14 Thread Gavin Atkinson
On Wed, 5 Oct 2011, Ian FREISLICH wrote:
 
 In no particular order:

It's a shame that nobody has yet picked up on this, it is a very useful 
list of bugs in 9.0.  Is there any chance you could log these three issues 
as three separate PRs so that they don't get lost?  Please tag them with 
[regression] in the subject if they are indeed regressions from a previous 
version (they sound like they are) so that they hopefully get onto the 
release engineering radar.

 1. bce(4) transmit and recieve ring buffer overruns
   On a moderately busy router with a full BGP table and
   aggregate throughput of between 200mbps and 800mbps, I get
   these buffer overruns at an average rate of 28 per second
   on the busiest interface.
 
   [firewall1.jnb1] ~ # sysctl dev.bce |grep com_no_buffers
   dev.bce.0.com_no_buffers: 101
   dev.bce.1.com_no_buffers: 0
   dev.bce.2.com_no_buffers: 32547
   dev.bce.3.com_no_buffers: 444
   
   I've tried increasing the TX_PAGES and RX_PAGES in
   sys/dev/bce/if_bcereg.h as I've done in the past (to 64)
   which is what resolved this problem on 8.2-STABLE to no avail.
   It appears that there is a hard limit of 8 according to
   bce_set_tunables() in if_bce.c.  But no values to hw.bce.tx_pages
   and hw.bce.rx_pages makes the slightest difference.
 
 2. carp(4) on my backup router randomly takes over MASTER on the
   standby host, but when ifconfig claims the carp interface
   is master tcpdump shows that it's not broadcasting its
   advertisement.  The actual master still broadcasts and no
   setting of advskew or advbase changes the 9-BETA host's
   idea of who is actually master.  I have to reboot the host
   to reset the carp interfaces.  destroying and re-creating
   them just brings them up as backup for about a second and
   then they regress to master.

Is ipv6 involved in this?

Do you have a couple of sample config files that I can drop onto two 
machines and recreate the issue, by any chance?

 3. PF doesn't expire state. The state table on my older host (pre
   OpenBSD-4.5) has the following stats:
 
   Status: Enabled for 0 days 00:37:17   Debug: Urgent
   State Table  Total Rate
 current entries   169546   
 searches9438745142193.8/s
 inserts  4012389 1793.6/s
 removals 3842843 1717.9/s
 
   The 9-BETA3 host's current entries exactly match the number
   of inserts until it hits the hard limit of 1.5M entries and
   can add no more.  It takes about 10 minutes to fill up and
   then no new flows are routed.

I've seen a few reports of this, and it's quite concerning.  Please, can 
you submit this as a PR?

Thanks,

Gavin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: 3 show-stopper issues with 9-BETA3

2011-10-14 Thread Vincent Hoffman
On 14/10/2011 19:58, Gavin Atkinson wrote:
  3. PF doesn't expire state. The state table on my older host (pre
 OpenBSD-4.5) has the following stats:
  
 Status: Enabled for 0 days 00:37:17   Debug: Urgent
 State Table  Total Rate
   current entries   169546   
   searches9438745142193.8/s
   inserts  4012389 1793.6/s
   removals 3842843 1717.9/s
  
 The 9-BETA3 host's current entries exactly match the number
 of inserts until it hits the hard limit of 1.5M entries and
 can add no more.  It takes about 10 minutes to fill up and
 then no new flows are routed.
 I've seen a few reports of this, and it's quite concerning.  Please, can 
 you submit this as a PR?
For tracking, this was a previous report with apparently a temporary
workaround.
http://lists.freebsd.org/pipermail/freebsd-pf/2011-October/006333.html
I have a stable-9 virtual machine i can test on if needed but I have pf
loaded as a module at the moment so dont have the issue.


Vince

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


3 show-stopper issues with 9-BETA3

2011-10-05 Thread Ian FREISLICH
Hi

In no particular order:

1. bce(4) transmit and recieve ring buffer overruns
On a moderately busy router with a full BGP table and
aggregate throughput of between 200mbps and 800mbps, I get
these buffer overruns at an average rate of 28 per second
on the busiest interface.

[firewall1.jnb1] ~ # sysctl dev.bce |grep com_no_buffers
dev.bce.0.com_no_buffers: 101
dev.bce.1.com_no_buffers: 0
dev.bce.2.com_no_buffers: 32547
dev.bce.3.com_no_buffers: 444

I've tried increasing the TX_PAGES and RX_PAGES in
sys/dev/bce/if_bcereg.h as I've done in the past (to 64)
which is what resolved this problem on 8.2-STABLE to no avail.
It appears that there is a hard limit of 8 according to
bce_set_tunables() in if_bce.c.  But no values to hw.bce.tx_pages
and hw.bce.rx_pages makes the slightest difference.

2. carp(4) on my backup router randomly takes over MASTER on the
standby host, but when ifconfig claims the carp interface
is master tcpdump shows that it's not broadcasting its
advertisement.  The actual master still broadcasts and no
setting of advskew or advbase changes the 9-BETA host's
idea of who is actually master.  I have to reboot the host
to reset the carp interfaces.  destroying and re-creating
them just brings them up as backup for about a second and
then they regress to master.

3. PF doesn't expire state. The state table on my older host (pre
OpenBSD-4.5) has the following stats:

Status: Enabled for 0 days 00:37:17   Debug: Urgent
State Table  Total Rate
  current entries   169546   
  searches9438745142193.8/s
  inserts  4012389 1793.6/s
  removals 3842843 1717.9/s

The 9-BETA3 host's current entries exactly match the number
of inserts until it hits the hard limit of 1.5M entries and
can add no more.  It takes about 10 minutes to fill up and
then no new flows are routed.

We're in a quiet period at the moment, so I can keep a 9-X host
around for a few days.  I'll be able to try things until I have to
downgrade the other host at the end of the week.  Incompatibility
between pf on 8.2-STABLE and 9-X after 2011-06-28 makes testing a
little difficult though because I'm not able to synchronise state.

FWIW, the tuning that has been done eliminates the issue on 8.2-STABLE:
[firewall1.jnb1] ~ # cat /boot/loader.conf 
net.isr.maxthreads=8
net.isr.defaultqlimit=4096
net.isr.maxqlimit=81920
net.isr.direct=1
kern.ipc.nmbclusters=262144
kern.maxusers=1024

[firewall1.jnb1] ~ # cat /etc/sysctl.conf 
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.ip.fastforwarding=1
net.inet.carp.preempt=1
net.inet.icmp.icmplim_output=0
net.inet.icmp.icmplim=0
kern.random.sys.harvest.interrupt=0
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
net.route.netisr_maxqlen=8192

diff -u -d -r1.26.2.7 if_bcereg.h
--- if_bcereg.h 15 Aug 2010 23:56:57 -  1.26.2.7
+++ if_bcereg.h 5 Oct 2011 14:29:15 -
@@ -6150,7 +6150,7 @@
  * Page count must remain a power of 2 for all
  * of the math to work correctly.
  */
-#define TX_PAGES   2
+#define TX_PAGES   64
 #define TOTAL_TX_BD_PER_PAGE  (BCM_PAGE_SIZE / sizeof(struct tx_bd))
 #define USABLE_TX_BD_PER_PAGE (TOTAL_TX_BD_PER_PAGE - 1)
 #define TOTAL_TX_BD (TOTAL_TX_BD_PER_PAGE * TX_PAGES)
@@ -6170,7 +6170,7 @@
  * Page count must remain a power of 2 for all
  * of the math to work correctly.
  */
-#define RX_PAGES   2
+#define RX_PAGES   64
 #define TOTAL_RX_BD_PER_PAGE  (BCM_PAGE_SIZE / sizeof(struct rx_bd))
 #define USABLE_RX_BD_PER_PAGE (TOTAL_RX_BD_PER_PAGE - 1)
 #define TOTAL_RX_BD (TOTAL_RX_BD_PER_PAGE * RX_PAGES)

Ian

-- 
Ian Freislich
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org