Re: Speaking of ship blockers for 9....

2012-08-14 Thread Ian FREISLICH
Gleb Smirnoff wrote:
 I Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, if=
 I tun0, stored af=2, a0: 10.0.2.220:60985, a1: 192.41.162.30:53, proto=17,
 I found af=2, a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
 
 Let me give you link to my branch of pf:
 
 http://lists.freebsd.org/pipermail/freebsd-pf/2012-June/006643.html
 http://lists.freebsd.org/pipermail/freebsd-pf/2012-June/006662.html
 
 In that branch the code that puts the reverse pointer on state keys,
 as well as the m_addr_changed() function and the pf_compare_state_keys()
 had been cut away.
 
 So, this exact bug definitely can't be reproduced there. However, others
 may hide in :)
 
 Let me encourage you to try and test my branch (instructions in URLs
 above).

I do see much better performance, however, I'm seeing this panic
after about 23 minutes (the slightly higher uptime was a result of
a manual fsck).  This system is not particularly loaded.  It's a
UP Pentium-m which is our office gateway.  I can give you access
to inspect if you like.

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x0
fault code  = supervisor write, page not present
instruction pointer = 0x20:0xc046f8f4
stack pointer   = 0x28:0xeb7b7bd8
frame pointer   = 0x28:0xeb7b7bec
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 4 (pf purge)
trap number = 12
panic: page fault
KDB: stack backtrace:
db_trace_self_wrapper(c0819c2b,eb7b7a78,c05d5829,c0816ff2,c08acca0,...) at 
db_trace_self_wrapper+0x26
kdb_backtrace(c0816ff2,c08acca0,c07f2736,eb7b7a84,eb7b7a84,...) at 
kdb_backtrace+0x29
panic(c07f2736,c0845a85,c559fd68,1,1,...) at panic+0xc9
trap_fatal(0,c60c826c,c610b31c,c610ac44,8,...) at trap_fatal+0x353
trap_pfault(eb7b7b18,c05c0a2d,c0ecc500,c0ecc608,c54ec000,...) at 
trap_pfault+0xd9
trap(eb7b7b98) at trap+0x418
calltrap() at calltrap+0x6
--- trap 0xc, eip = 0xc046f8f4, esp = 0xeb7b7bd8, ebp = 0xeb7b7bec ---
pf_state_key_detach(eb7b7c18,c046af2a,502a6f69,0,8000,...) at 
pf_state_key_detach+0x74
pf_detach_state(c64d5d00,0,8000,0,c559fbc0,...) at pf_detach_state+0x1c6
pf_unlink_state(c64d5d00,1,0,0,c0870398,...) at pf_unlink_state+0x1c5
pf_purge_expired_states(c08947c0,0,0,c07eadbf,64,...) at 
pf_purge_expired_states+0xe6
pf_purge_thread(0,eb7b7d08,0,c54ec000,0,...) at pf_purge_thread+0x14f
fork_exit(c0471b60,0,eb7b7d08) at fork_exit+0xa2
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xeb7b7d40, ebp = 0 ---
Uptime: 57m29s
Physical memory: 2038 MB
Dumping 189 MB: 174 158 142 126 110 94 78 62 46 30 14



(kgdb) bt
#0  doadump (textdump=1) at pcpu.h:249
#1  0xc05d563a in kern_reboot (howto=260)
at /usr/src.pflock/sys/kern/kern_shutdown.c:449
#2  0xc05d5888 in panic (fmt=Variable fmt is not available.) at 
/usr/src.pflock/sys/kern/kern_shutdown.c:637
#3  0xc07b8b23 in trap_fatal (frame=0xeb7b7b98, eva=0)
at /usr/src.pflock/sys/i386/i386/trap.c:1028
#4  0xc07b8c09 in trap_pfault (frame=0xeb7b7b98, usermode=0, eva=0)
at /usr/src.pflock/sys/i386/i386/trap.c:881
#5  0xc07b9a58 in trap (frame=dwarf2_read_address: Corrupted DWARF expression.) 
at /usr/src.pflock/sys/i386/i386/trap.c:552
#6  0xc07a579c in calltrap () at /usr/src.pflock/sys/i386/i386/exception.s:169
#7  0xc046f8f4 in pf_state_key_detach (s=0xc64d5d00, idx=1)
at /usr/src.pflock/sys/contrib/pf/net/pf.c:1040
#8  0xc04713f6 in pf_detach_state (s=0xc64d5d00)
at /usr/src.pflock/sys/contrib/pf/net/pf.c:1006
#9  0xc0471975 in pf_unlink_state (s=0xc64d5d00, flags=Variable flags is not 
available.)
at /usr/src.pflock/sys/contrib/pf/net/pf.c:1520
#10 0xc0471a96 in pf_purge_expired_states (maxcheck=148)
at /usr/src.pflock/sys/contrib/pf/net/pf.c:1573
#11 0xc0471caf in pf_purge_thread (v=0x0)
at /usr/src.pflock/sys/contrib/pf/net/pf.c:1371
#12 0xc05a5af2 in fork_exit (callout=0xc0471b60 pf_purge_thread, arg=0x0, 
frame=0xeb7b7d08) at /usr/src.pflock/sys/kern/kern_fork.c:995
#13 0xc07a5814 in fork_trampoline ()
at /usr/src.pflock/sys/i386/i386/exception.s:276

Ian

-- 
Ian Freislich
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Speaking of ship blockers for 9....

2012-08-11 Thread Ian FREISLICH
Gleb Smirnoff wrote:
 Let me give you link to my branch of pf:
 
 http://lists.freebsd.org/pipermail/freebsd-pf/2012-June/006643.html
 http://lists.freebsd.org/pipermail/freebsd-pf/2012-June/006662.html
 
 In that branch the code that puts the reverse pointer on state keys,
 as well as the m_addr_changed() function and the pf_compare_state_keys()
 had been cut away.
 
 So, this exact bug definitely can't be reproduced there. However, others
 may hide in :)

Thanks.  I'll be able to work on this next week.  My system is
pretty similar to yours - 16 cores, full BGP RIB, 20+ VLANs + CARP
on 4*bce(4), PF+Sync, 400k+ states, NAT, tables, anchors etc.

The complication is that the production system is on 8 and the
pfsync is incompatible with 9 and CURRENT.  And, 9/CURRENT is
unuseable for me as a backup without this fix because of the state
mismatch rate.

Ian

-- 
Ian Freislich
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Speaking of ship blockers for 9....

2012-08-09 Thread Gleb Smirnoff
  Ian,

On Tue, Aug 07, 2012 at 08:17:56PM +0200, Ian FREISLICH wrote:
I I have a problem that's been getting progressively worse as the
I source progresses.  So much so that it's had me searching all the
I way from 8.0-RELEASE to 10-CURRENT without luck on both amd64 and
I i386.
I 
I pf(4) erroneously mismatches state and then blocks an active flow.
I It seems that 8.X does so silently and 9 to -CURRENT do so verbosely.
I Whether silent or loud, the effect on traffic makes it impracticle
I to use FreeBSD+PF for a firewall in any setting (my use is home,
I small office, large office and moderately large datacenter core
I router).  It appears that this has actually been a forever problem
I that just being tickled more now.
...
I ...
I   state-mismatch2777673.6/s
I 
I That's 277767 flows terminated in the last almost 22 hours due to
I this pf bug. (!!!)
I 
I 9.1-PRERELEASE logs (as does -CURRENT):
I Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, 
if=tun0, stored af=2, a0: 10.0.2.220:60985, a1: 192.41.162.30:53, proto=17, 
found af=2, a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.

Let me give you link to my branch of pf:

http://lists.freebsd.org/pipermail/freebsd-pf/2012-June/006643.html
http://lists.freebsd.org/pipermail/freebsd-pf/2012-June/006662.html

In that branch the code that puts the reverse pointer on state keys,
as well as the m_addr_changed() function and the pf_compare_state_keys()
had been cut away.

So, this exact bug definitely can't be reproduced there. However, others
may hide in :)

Let me encourage you to try and test my branch (instructions in URLs
above).

P.S. I plan to merge it to head at the and of August.

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Speaking of ship blockers for 9....

2012-08-07 Thread Garrett Cooper
On Aug 7, 2012, at 11:17 AM, Ian FREISLICH i...@clue.co.za wrote:

 Garrett Cooper
Is this is in 9.1 -PRERELEASE, -RELEASE (or whatever the official
 label is...)? If so, it seems like this would be a ship blocker.
 
 I have a problem that's been getting progressively worse as the
 source progresses.  So much so that it's had me searching all the
 way from 8.0-RELEASE to 10-CURRENT without luck on both amd64 and
 i386.
 
 pf(4) erroneously mismatches state and then blocks an active flow.
 It seems that 8.X does so silently and 9 to -CURRENT do so verbosely.
 Whether silent or loud, the effect on traffic makes it impracticle
 to use FreeBSD+PF for a firewall in any setting (my use is home,
 small office, large office and moderately large datacenter core
 router).  It appears that this has actually been a forever problem
 that just being tickled more now.
 
 Here's from my home firewall:
 Status: Enabled for 7 days 02:57:58   Debug: Urgent
 
 State Table  Total Rate
  current entries 1653   
  searches45792251   74.4/s
  inserts   4283750.7/s
  removals  4267220.7/s
 ...
  state-mismatch  15860.0/s
 
 
 Here's from a moderately busy firewall:
 Status: Enabled for 0 days 21:40:44   Debug: Urgent
 
 State Table  Total Rate
  current entries   122395   
  searches  442864168556745.4/s
  inserts202644593 2596.5/s
  removals   202522198 2595.0/s
 ...
  state-mismatch2777673.6/s
 
 That's 277767 flows terminated in the last almost 22 hours due to
 this pf bug. (!!!)
 
 9.1-PRERELEASE logs (as does -CURRENT):
 Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, 
 if=tun0, stored af=2, a0: 10.0.2.220:60985, a1: 192.41.162.30:53, proto=17, 
 found af=2, a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
 Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, 
 if=tun0, stored af=2, a0: 10.0.2.220:52995, a1: 41.154.2.100:53, proto=17, 
 found af=2, a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
 Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, 
 if=tun0, stored af=2, a0: 10.0.2.220:60095, a1: 206.223.136.200:53, proto=17, 
 found af=2, a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
 Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, 
 if=tun0, stored af=2, a0: 10.0.2.220:50463, a1: 206.223.136.200:53, proto=17, 
 found af=2, a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
 Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, 
 if=tun0, stored af=2, a0: 10.0.2.220:56748, a1: 192.41.162.30:53, proto=17, 
 found af=2, a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
 Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, 
 if=tun0, stored af=2, a0: 10.0.2.220:60793, a1: 192.41.162.30:53, proto=17, 
 found af=2, a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.

Filed a PR yet with packet captures?
Thanks,
-Garrett___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Speaking of ship blockers for 9....

2012-08-07 Thread matt

On 08/07/12 11:43, Garrett Cooper wrote:

On Aug 7, 2012, at 11:17 AM, Ian FREISLICH i...@clue.co.za wrote:


Garrett Cooper

Is this is in 9.1 -PRERELEASE, -RELEASE (or whatever the official
label is...)? If so, it seems like this would be a ship blocker.

I have a problem that's been getting progressively worse as the
source progresses.  So much so that it's had me searching all the
way from 8.0-RELEASE to 10-CURRENT without luck on both amd64 and
i386.

pf(4) erroneously mismatches state and then blocks an active flow.
It seems that 8.X does so silently and 9 to -CURRENT do so verbosely.
Whether silent or loud, the effect on traffic makes it impracticle
to use FreeBSD+PF for a firewall in any setting (my use is home,
small office, large office and moderately large datacenter core
router).  It appears that this has actually been a forever problem
that just being tickled more now.

Here's from my home firewall:
Status: Enabled for 7 days 02:57:58   Debug: Urgent

State Table  Total Rate
  current entries 1653
  searches45792251   74.4/s
  inserts   4283750.7/s
  removals  4267220.7/s
...
  state-mismatch  15860.0/s


Here's from a moderately busy firewall:
Status: Enabled for 0 days 21:40:44   Debug: Urgent

State Table  Total Rate
  current entries   122395
  searches  442864168556745.4/s
  inserts202644593 2596.5/s
  removals   202522198 2595.0/s
...
  state-mismatch2777673.6/s

That's 277767 flows terminated in the last almost 22 hours due to
this pf bug. (!!!)

9.1-PRERELEASE logs (as does -CURRENT):
Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, if=tun0, 
stored af=2, a0: 10.0.2.220:60985, a1: 192.41.162.30:53, proto=17, found af=2, 
a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, if=tun0, 
stored af=2, a0: 10.0.2.220:52995, a1: 41.154.2.100:53, proto=17, found af=2, 
a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, if=tun0, 
stored af=2, a0: 10.0.2.220:60095, a1: 206.223.136.200:53, proto=17, found 
af=2, a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, if=tun0, 
stored af=2, a0: 10.0.2.220:50463, a1: 206.223.136.200:53, proto=17, found 
af=2, a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, if=tun0, 
stored af=2, a0: 10.0.2.220:56748, a1: 192.41.162.30:53, proto=17, found af=2, 
a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.
Jul 22 08:54:25 brane kernel: pf: state key linking mismatch! dir=OUT, if=tun0, 
stored af=2, a0: 10.0.2.220:60793, a1: 192.41.162.30:53, proto=17, found af=2, 
a0: 41.154.2.53:1701, a1: 41.133.165.161:59051, proto=17.

 Filed a PR yet with packet captures?
Thanks,
-Garrett___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

I was having this problem on one machine but not another (different 
pf.confs). Are you using synproxy state or modulate state? Feel OK 
posting a basic pf.conf that experiences the issue?


I feel like there was something with either scrub or synproxy I had to 
remove to make the hurting stop.
Obviously that means something is probably borked, and I will share in 
the no-pr shame.


Matt
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org