Re: Fatal page fault in cbq_enqueue()
On Mon, Oct 09, 2017 at 03:39:41AM -0700, Aaron J. Grier wrote: > On Sat, Oct 07, 2017 at 07:33:45PM +1100, Paul Ripke wrote: > > Ok, I've bitten the bullet and switched to npf and altq. Setup seems > > to be working so far, and it's fixed another annoyance I recently > > noticed: ipf was blocking in-bound syn-acks from select remote sites. > > 99% of sites were fine, a handful of sites broke. It's all fine with > > npf. > > got an example of conf files? or is it separately using npf.conf and > altq.conf? Nope, just separately using npf.conf & altq.conf. Seems to work as expected, at least after I figured out pr/52609, which caught me by surprise. -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
Re: Fatal page fault in cbq_enqueue()
On Sat, Oct 07, 2017 at 07:33:45PM +1100, Paul Ripke wrote: > Ok, I've bitten the bullet and switched to npf and altq. Setup seems > to be working so far, and it's fixed another annoyance I recently > noticed: ipf was blocking in-bound syn-acks from select remote sites. > 99% of sites were fine, a handful of sites broke. It's all fine with > npf. got an example of conf files? or is it separately using npf.conf and altq.conf? -- Aaron J. Grier | "Not your ordinary poofy goof." | agr...@poofygoof.com
Re: Fatal page fault in cbq_enqueue()
On Oct 7, 7:33pm, s...@stix.id.au (Paul Ripke) wrote: -- Subject: Re: Fatal page fault in cbq_enqueue() | Ok, I've bitten the bullet and switched to npf and altq. Setup seems | to be working so far, and it's fixed another annoyance I recently | noticed: ipf was blocking in-bound syn-acks from select remote sites. | 99% of sites were fine, a handful of sites broke. It's all fine with | npf. Great :-) christos
Re: Fatal page fault in cbq_enqueue()
Ok, I've bitten the bullet and switched to npf and altq. Setup seems to be working so far, and it's fixed another annoyance I recently noticed: ipf was blocking in-bound syn-acks from select remote sites. 99% of sites were fine, a handful of sites broke. It's all fine with npf. -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
Re: Fatal page fault in cbq_enqueue()
Hello. We use altq plus pf all over the place in our company. We've looked at using npf, but it doesn't have the feature set we need to make all of our stuff go. Right now, we're using NetBSD-5, which is rock solid in terms of reliability. I don't know if it's easier to make pf and altq work in NET_MPSAFE mode, or add the missing functionality to npf, but for my part, I think it's easier for me to fix pf plus altq in terms of using it in an SMP environment than it is to add the missing features to npf. -thanks -Brian On Sep 26, 11:37am, Paul Ripke wrote: } Subject: Re: Fatal page fault in cbq_enqueue() } Recently upgraded to netbsd-8 branch, and I'm still seeing these } occassionally. Eg: } } Sep 25 20:57:16 slave /netbsd: fatal page fault in supervisor mode } Sep 25 20:57:16 slave /netbsd: trap type 6 code 0 rip 0x807a68b9 cs 0x8 rflags 0x10286 cr2 0x8 ilevel 0x8 rsp 0xfe80400077e0 } Sep 25 20:57:16 slave /netbsd: curlwp 0xfe811f932420 pid 0.3 lowest kstack 0xfe80400042c0 } Sep 25 20:57:16 slave /netbsd: panic: trap } Sep 25 20:57:16 slave /netbsd: cpu0: Begin traceback... } Sep 25 20:57:16 slave /netbsd: vpanic() at netbsd:vpanic+0x140 } Sep 25 20:57:16 slave /netbsd: snprintf() at netbsd:snprintf } Sep 25 20:57:16 slave /netbsd: trap() at netbsd:trap+0xc6b } Sep 25 20:57:16 slave /netbsd: --- trap (number 6) --- } Sep 25 20:57:16 slave /netbsd: rmc_queue_packet() at netbsd:rmc_queue_packet+0x150 } Sep 25 20:57:16 slave /netbsd: cbq_enqueue() at netbsd:cbq_enqueue+0xee } Sep 25 20:57:16 slave /netbsd: ifq_enqueue2() at netbsd:ifq_enqueue2+0xc4 } Sep 25 20:57:16 slave /netbsd: sppp_output() at netbsd:sppp_output+0x1ab } Sep 25 20:57:16 slave /netbsd: ip6_if_output() at netbsd:ip6_if_output+0x60 } Sep 25 20:57:16 slave /netbsd: ipf_fastroute() at netbsd:ipf_fastroute+0x97e } Sep 25 20:57:16 slave /netbsd: ipf_send_ip() at netbsd:ipf_send_ip+0x13d } Sep 25 20:57:16 slave /netbsd: ipf_check() at netbsd:ipf_check+0xcfc } Sep 25 20:57:16 slave /netbsd: pfil_run_hooks() at netbsd:pfil_run_hooks+0x117 } Sep 25 20:57:16 slave /netbsd: ip6_input() at netbsd:ip6_input+0x278 } Sep 25 20:57:16 slave /netbsd: ip6intr() at netbsd:ip6intr+0x71 } Sep 25 20:57:16 slave /netbsd: softint_dispatch() at netbsd:softint_dispatch+0xd3 } Sep 25 20:57:16 slave /netbsd: DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe8040007ff0 } Sep 25 20:57:16 slave /netbsd: Xsoftintr() at netbsd:Xsoftintr+0x4f } } Are there many users of altq+ipf out there? Perhaps now is a good time to } switch to npf... } } -- } Paul Ripke } "Great minds discuss ideas, average minds discuss events, small minds } discuss people." } -- Disputed: Often attributed to Eleanor Roosevelt. 1948. >-- End of excerpt from Paul Ripke
Re: Fatal page fault in cbq_enqueue()
On Wed, Mar 08, 2017 at 08:53:56PM -0500, Christos Zoulas wrote: > On Mar 9, 12:16pm, s...@stix.id.au (Paul Ripke) wrote: > -- Subject: Re: Fatal page fault in cbq_enqueue() > > | > > Index: altq_classq.h > | > > === > | > > RCS file: /cvsroot/src/sys/altq/altq_classq.h,v > | > > retrieving revision 1.7 > | > > diff -u -u -r1.7 altq_classq.h > | > > --- altq_classq.h 12 Oct 2006 19:59:08 - 1.7 > | > > +++ altq_classq.h 27 Jan 2017 18:10:12 - > | > > @@ -108,9 +108,9 @@ > | > > { > | > > struct mbuf *m, *m0; > | > > > | > > - if ((m = qtail(q)) == NULL) > | > > + if ((m = qtail(q)) == NULL || (m0 = m->m_nextpkt) == NULL) > | > > return (NULL); > | > > - if ((m0 = m->m_nextpkt) != m) > | > > + if (m0 != m) > | > > m->m_nextpkt = m0->m_nextpkt; > | > > else > | > > qtail(q) = NULL; > | > > | > Indeed... Well, we'll see how it goes, I'm running with that now. I've > | > had one crash since, so a couple of weeks might be enough to have some > | > idea. > | > | Pity. Crashed elsewhere. I think there's a definite race in altq somewhere. > > So it took how many days? Booted with that patch around Jan 28. I think I've had a couple of silent reboots since, followed by: Mar 3 10:45:30 slave /netbsd: panic: _rmc_wrr_dequeue_next Mar 3 10:47:42 slave /netbsd: panic: _rmc_wrr_dequeue_next Mar 9 11:52:03 slave /netbsd: panic: _rmc_wrr_dequeue_next That's a remarkably tight cluster of crashes. -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
Re: Fatal page fault in cbq_enqueue()
On Mar 9, 12:16pm, s...@stix.id.au (Paul Ripke) wrote: -- Subject: Re: Fatal page fault in cbq_enqueue() | > > Index: altq_classq.h | > > === | > > RCS file: /cvsroot/src/sys/altq/altq_classq.h,v | > > retrieving revision 1.7 | > > diff -u -u -r1.7 altq_classq.h | > > --- altq_classq.h 12 Oct 2006 19:59:08 - 1.7 | > > +++ altq_classq.h 27 Jan 2017 18:10:12 - | > > @@ -108,9 +108,9 @@ | > > { | > > struct mbuf *m, *m0; | > > | > > - if ((m = qtail(q)) == NULL) | > > + if ((m = qtail(q)) == NULL || (m0 = m->m_nextpkt) == NULL) | > > return (NULL); | > > - if ((m0 = m->m_nextpkt) != m) | > > + if (m0 != m) | > > m->m_nextpkt = m0->m_nextpkt; | > > else | > > qtail(q) = NULL; | > | > Indeed... Well, we'll see how it goes, I'm running with that now. I've | > had one crash since, so a couple of weeks might be enough to have some | > idea. | | Pity. Crashed elsewhere. I think there's a definite race in altq somewhere. So it took how many days? christos
Re: Fatal page fault in cbq_enqueue()
On Wed, Feb 08, 2017 at 11:09:55PM +1100, Paul Ripke wrote: > On Fri, Jan 27, 2017 at 06:10:52PM +, Christos Zoulas wrote: > > In article <20170127111545.GB9450@slave.private>, > > Paul Ripkewrote: > > >Still happening, although not as often. Maybe the hot weather helps... > > > > > >I've got a core this time, on amd64 7.0_STABLE as of 2016-10-12. > > > > How about this? At least you'll not crash > > > > christos > > > > Index: altq_classq.h > > === > > RCS file: /cvsroot/src/sys/altq/altq_classq.h,v > > retrieving revision 1.7 > > diff -u -u -r1.7 altq_classq.h > > --- altq_classq.h 12 Oct 2006 19:59:08 - 1.7 > > +++ altq_classq.h 27 Jan 2017 18:10:12 - > > @@ -108,9 +108,9 @@ > > { > > struct mbuf *m, *m0; > > > > - if ((m = qtail(q)) == NULL) > > + if ((m = qtail(q)) == NULL || (m0 = m->m_nextpkt) == NULL) > > return (NULL); > > - if ((m0 = m->m_nextpkt) != m) > > + if (m0 != m) > > m->m_nextpkt = m0->m_nextpkt; > > else > > qtail(q) = NULL; > > Indeed... Well, we'll see how it goes, I'm running with that now. I've > had one crash since, so a couple of weeks might be enough to have some > idea. Pity. Crashed elsewhere. I think there's a definite race in altq somewhere. Mar 3 10:45:30 slave /netbsd: panic: _rmc_wrr_dequeue_next Mar 3 10:45:30 slave /netbsd: cpu0: Begin traceback... Mar 3 10:45:30 slave /netbsd: vpanic() at netbsd:vpanic+0x13c Mar 3 10:45:30 slave /netbsd: snprintf() at netbsd:snprintf Mar 3 10:45:30 slave /netbsd: rmc_dequeue_next() at netbsd:rmc_dequeue_next+0x50f Mar 3 10:45:30 slave /netbsd: cbq_dequeue() at netbsd:cbq_dequeue+0x27 Mar 3 10:45:30 slave /netbsd: tbr_dequeue() at netbsd:tbr_dequeue+0x72 Mar 3 10:45:30 slave /netbsd: sppp_dequeue() at netbsd:sppp_dequeue+0x111 Mar 3 10:45:30 slave /netbsd: pppoe_start() at netbsd:pppoe_start+0x30 Mar 3 10:45:30 slave /netbsd: tbr_timeout() at netbsd:tbr_timeout+0x4e Mar 3 10:45:30 slave /netbsd: callout_softclock() at netbsd:callout_softclock+0x248 Mar 3 10:45:30 slave /netbsd: softint_dispatch() at netbsd:softint_dispatch+0x79 Mar 3 10:45:30 slave /netbsd: DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe804000bff0 Mar 3 10:45:30 slave /netbsd: Xsoftintr() at netbsd:Xsoftintr+0x4f -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
Re: Fatal page fault in cbq_enqueue()
In article <20170127111545.GB9450@slave.private>, Paul Ripkewrote: >Still happening, although not as often. Maybe the hot weather helps... > >I've got a core this time, on amd64 7.0_STABLE as of 2016-10-12. How about this? At least you'll not crash christos Index: altq_classq.h === RCS file: /cvsroot/src/sys/altq/altq_classq.h,v retrieving revision 1.7 diff -u -u -r1.7 altq_classq.h --- altq_classq.h 12 Oct 2006 19:59:08 - 1.7 +++ altq_classq.h 27 Jan 2017 18:10:12 - @@ -108,9 +108,9 @@ { struct mbuf *m, *m0; - if ((m = qtail(q)) == NULL) + if ((m = qtail(q)) == NULL || (m0 = m->m_nextpkt) == NULL) return (NULL); - if ((m0 = m->m_nextpkt) != m) + if (m0 != m) m->m_nextpkt = m0->m_nextpkt; else qtail(q) = NULL;
Re: Fatal page fault in cbq_enqueue()
Still happening, although not as often. Maybe the hot weather helps... I've got a core this time, on amd64 7.0_STABLE as of 2016-10-12. (gdb) bt #0 0x80647ff5 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /home/netbsd/netbsd-7/src/sys/arch/amd64/amd64/machdep.c:671 #1 0x808720c2 in vpanic (fmt=fmt@entry=0x80d23a12 "trap", ap=ap@entry=0xfe80400057a0) at /home/netbsd/netbsd-7/src/sys/kern/subr_prf.c:340 #2 0x8087217d in panic (fmt=fmt@entry=0x80d23a12 "trap") at /home/netbsd/netbsd-7/src/sys/kern/subr_prf.c:256 #3 0x808b5476 in trap (frame=0xfe80400058c0) at /home/netbsd/netbsd-7/src/sys/arch/amd64/amd64/trap.c:298 #4 0x80100f46 in alltraps () #5 0x80179806 in _getq (q=0xfe810a5d7628, q=0xfe810a5d7628) at /home/netbsd/netbsd-7/src/sys/altq/altq_classq.h:113 #6 _rmc_dropq (cl=) at /home/netbsd/netbsd-7/src/sys/altq/altq_rmclass.c:1621 #7 rmc_drop_action (cl=0xfe811e4b4c08) at /home/netbsd/netbsd-7/src/sys/altq/altq_rmclass.c:1434 #8 rmc_queue_packet (cl=cl@entry=0xfe811e4b4c08, m=m@entry=0xfe8095f79200) at /home/netbsd/netbsd-7/src/sys/altq/altq_rmclass.c:802 #9 0x80178327 in cbq_enqueue (ifq=0xfe8108d09120, m=0xfe8095f79200, pktattr=) at /home/netbsd/netbsd-7/src/sys/altq/altq_cbq.c:536 #10 0x803cf62c in ifq_enqueue2 (ifp=ifp@entry=0xfe8108d09008, ifq=ifq@entry=0x0, m=m@entry=0xfe8095f79200, pktattr=pktattr@entry=0xfe8040005a60) at /home/netbsd/netbsd-7/src/sys/net/if.c:2227 #11 0x80471689 in sppp_output (ifp=0xfe8108d09008, m=0xfe8095f79200, dst=, rt=) at /home/netbsd/netbsd-7/src/sys/net/if_spppsubr.c:882 #12 0x80690976 in nd6_output (ifp=ifp@entry=0xfe8108d09008, origifp=origifp@entry=0xfe8108d09008, m0=m0@entry=0xfe8095f79200, dst=0xfe8108be1388, rt0=) at /home/netbsd/netbsd-7/src/sys/netinet6/nd6.c:2335 #13 0x80550b6f in ipf_fastroute6 (mpp=0xfe8040005bf8, fdp=0x0, fin=0xfe8040005c00, m0=0xfe8095f79200) at /home/netbsd/netbsd-7/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:1444 #14 ipf_fastroute (m0=0xfe8095f79200, mpp=mpp@entry=0xfe8040005bf8, fin=fin@entry=0xfe8040005c00, fdp=fdp@entry=0x0) at /home/netbsd/netbsd-7/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:1068 #15 0x8054fb4d in ipf_send_ip (fin=fin@entry=0xfe8040005d60, m=m@entry=0xfe8095f79200) at /home/netbsd/netbsd-7/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:849 #16 0x8054fda8 in ipf_send_reset (fin=fin@entry=0xfe8040005d60) at /home/netbsd/netbsd-7/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:775 #17 0x803471af in ipf_check (ctx=0x8105b560 , ip=, hlen=, ifp=, out=0, mp=0xfe8040005eb0) at /home/netbsd/netbsd-7/src/sys/external/bsd/ipf/netinet/fil.c:3081 #18 0x8071d26e in pfil_run_hooks (ph=, mp=mp@entry=0xfe8040005ef8, ifp=0xfe8108d09008, dir=dir@entry=1) at /home/netbsd/netbsd-7/src/sys/net/pfil.c:266 #19 0x8053bfbc in ip6_input (m=0xfe80d9337800) at /home/netbsd/netbsd-7/src/sys/netinet6/ip6_input.c:350 #20 0x8053ca05 in ip6intr (arg=) at /home/netbsd/netbsd-7/src/sys/netinet6/ip6_input.c:238 #21 0x805e9eaa in softint_execute (l=, s=, si=) at /home/netbsd/netbsd-7/src/sys/kern/kern_softint.c:589 #22 softint_dispatch (pinned=, s=4) at /home/netbsd/netbsd-7/src/sys/kern/kern_softint.c:871 #23 0x8011402f in Xsoftintr () (gdb) f 5 #5 0x80179806 in _getq (q=0xfe810a5d7628, q=0xfe810a5d7628) at /home/netbsd/netbsd-7/src/sys/altq/altq_classq.h:113 113 if ((m0 = m->m_nextpkt) != m) (gdb) list 108 { 109 struct mbuf *m, *m0; 110 111 if ((m = qtail(q)) == NULL) 112 return (NULL); 113 if ((m0 = m->m_nextpkt) != m) 114 m->m_nextpkt = m0->m_nextpkt; 115 else 116 qtail(q) = NULL; 117 qlen(q)--; (gdb) p m0 $1 = (struct mbuf *) 0x0 (gdb) p m->m_hdr $2 = { mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xfe8095f79266 "", mh_owner = 0x0, mh_len = 62, mh_flags = 2, mh_paddr = 2516029952, mh_type = 2 } That's clearly not going to work. Ideas? -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
Re: Fatal page fault in cbq_enqueue()
So, just wondering if anyone else is using altq, or if there's some replacement I don't know about? I'm running NetBSD 7.0_STABLE amd64 as of 2016-10-12, and it's panicing a couple of times a week. Latest was at: Oct 17 07:03:04 slave /netbsd: panic: _rmc_wrr_dequeue_next Oct 17 07:03:04 slave /netbsd: cpu0: Begin traceback... Oct 17 07:03:04 slave /netbsd: vpanic() at netbsd:vpanic+0x13c Oct 17 07:03:04 slave /netbsd: snprintf() at netbsd:snprintf Oct 17 07:03:04 slave /netbsd: rmc_dequeue_next() at netbsd:rmc_dequeue_next+0x4fd Oct 17 07:03:04 slave /netbsd: cbq_dequeue() at netbsd:cbq_dequeue+0x27 Oct 17 07:03:04 slave /netbsd: tbr_dequeue() at netbsd:tbr_dequeue+0x72 Oct 17 07:03:04 slave /netbsd: sppp_dequeue() at netbsd:sppp_dequeue+0x111 Oct 17 07:03:04 slave /netbsd: pppoe_start() at netbsd:pppoe_start+0x30 Oct 17 07:03:04 slave /netbsd: sppp_output() at netbsd:sppp_output+0x440 Oct 17 07:03:04 slave /netbsd: ip_output() at netbsd:ip_output+0xc4e Oct 17 07:03:04 slave /netbsd: tcp_output() at netbsd:tcp_output+0x14ac Oct 17 07:03:04 slave /netbsd: tcp_input() at netbsd:tcp_input+0xd67 Oct 17 07:03:04 slave /netbsd: ipintr() at netbsd:ipintr+0x81b Oct 17 07:03:04 slave /netbsd: softint_dispatch() at netbsd:softint_dispatch+0x79 Oct 17 07:03:04 slave /netbsd: DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe804 0005ff0 Oct 17 07:03:04 slave /netbsd: Xsoftintr() at netbsd:Xsoftintr+0x4f Oct 17 07:03:04 slave /netbsd: --- interrupt --- Oct 17 07:03:04 slave /netbsd: 0: Oct 17 07:03:04 slave /netbsd: cpu0: End traceback... -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.