Re: Fatal page fault in cbq_enqueue()

2017-10-16 Thread Paul Ripke
On Mon, Oct 09, 2017 at 03:39:41AM -0700, Aaron J. Grier wrote:
> On Sat, Oct 07, 2017 at 07:33:45PM +1100, Paul Ripke wrote:
> > Ok, I've bitten the bullet and switched to npf and altq. Setup seems
> > to be working so far, and it's fixed another annoyance I recently
> > noticed: ipf was blocking in-bound syn-acks from select remote sites.
> > 99% of sites were fine, a handful of sites broke. It's all fine with
> > npf.
> 
> got an example of conf files?  or is it separately using npf.conf and
> altq.conf?

Nope, just separately using npf.conf & altq.conf. Seems to work as
expected, at least after I figured out pr/52609, which caught me by
surprise.

-- 
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.


Re: Fatal page fault in cbq_enqueue()

2017-10-09 Thread Aaron J. Grier
On Sat, Oct 07, 2017 at 07:33:45PM +1100, Paul Ripke wrote:
> Ok, I've bitten the bullet and switched to npf and altq. Setup seems
> to be working so far, and it's fixed another annoyance I recently
> noticed: ipf was blocking in-bound syn-acks from select remote sites.
> 99% of sites were fine, a handful of sites broke. It's all fine with
> npf.

got an example of conf files?  or is it separately using npf.conf and
altq.conf?

-- 
  Aaron J. Grier | "Not your ordinary poofy goof." | agr...@poofygoof.com


Re: Fatal page fault in cbq_enqueue()

2017-10-08 Thread Christos Zoulas
On Oct 7,  7:33pm, s...@stix.id.au (Paul Ripke) wrote:
-- Subject: Re: Fatal page fault in cbq_enqueue()

| Ok, I've bitten the bullet and switched to npf and altq. Setup seems
| to be working so far, and it's fixed another annoyance I recently
| noticed: ipf was blocking in-bound syn-acks from select remote sites.
| 99% of sites were fine, a handful of sites broke. It's all fine with
| npf.

Great :-)

christos


Re: Fatal page fault in cbq_enqueue()

2017-10-08 Thread Paul Ripke
Ok, I've bitten the bullet and switched to npf and altq. Setup seems
to be working so far, and it's fixed another annoyance I recently
noticed: ipf was blocking in-bound syn-acks from select remote sites.
99% of sites were fine, a handful of sites broke. It's all fine with
npf.

-- 
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.


Re: Fatal page fault in cbq_enqueue()

2017-09-26 Thread Brian Buhrow
Hello.  We use altq plus pf all over the place in our company.  We've
looked at using npf, but it doesn't have the feature set we need to make
all of our stuff go.  Right now, we're using NetBSD-5, which is rock solid
in terms of reliability.  I don't know if it's easier to make pf and altq
work in NET_MPSAFE mode, or add the missing functionality to npf, but for
my part, I think it's easier for me to fix  pf plus altq in terms of using
it in an SMP environment than it is to add the missing features to npf.  
-thanks
-Brian

On Sep 26, 11:37am, Paul Ripke wrote:
} Subject: Re: Fatal page fault in cbq_enqueue()
} Recently upgraded to netbsd-8 branch, and I'm still seeing these
} occassionally. Eg:
} 
} Sep 25 20:57:16 slave /netbsd: fatal page fault in supervisor mode
} Sep 25 20:57:16 slave /netbsd: trap type 6 code 0 rip 0x807a68b9 cs 
0x8 rflags 0x10286 cr2 0x8 ilevel 0x8 rsp 0xfe80400077e0
} Sep 25 20:57:16 slave /netbsd: curlwp 0xfe811f932420 pid 0.3 lowest 
kstack 0xfe80400042c0
} Sep 25 20:57:16 slave /netbsd: panic: trap
} Sep 25 20:57:16 slave /netbsd: cpu0: Begin traceback...
} Sep 25 20:57:16 slave /netbsd: vpanic() at netbsd:vpanic+0x140
} Sep 25 20:57:16 slave /netbsd: snprintf() at netbsd:snprintf
} Sep 25 20:57:16 slave /netbsd: trap() at netbsd:trap+0xc6b
} Sep 25 20:57:16 slave /netbsd: --- trap (number 6) ---
} Sep 25 20:57:16 slave /netbsd: rmc_queue_packet() at 
netbsd:rmc_queue_packet+0x150
} Sep 25 20:57:16 slave /netbsd: cbq_enqueue() at netbsd:cbq_enqueue+0xee
} Sep 25 20:57:16 slave /netbsd: ifq_enqueue2() at netbsd:ifq_enqueue2+0xc4
} Sep 25 20:57:16 slave /netbsd: sppp_output() at netbsd:sppp_output+0x1ab
} Sep 25 20:57:16 slave /netbsd: ip6_if_output() at netbsd:ip6_if_output+0x60
} Sep 25 20:57:16 slave /netbsd: ipf_fastroute() at netbsd:ipf_fastroute+0x97e
} Sep 25 20:57:16 slave /netbsd: ipf_send_ip() at netbsd:ipf_send_ip+0x13d
} Sep 25 20:57:16 slave /netbsd: ipf_check() at netbsd:ipf_check+0xcfc
} Sep 25 20:57:16 slave /netbsd: pfil_run_hooks() at netbsd:pfil_run_hooks+0x117
} Sep 25 20:57:16 slave /netbsd: ip6_input() at netbsd:ip6_input+0x278
} Sep 25 20:57:16 slave /netbsd: ip6intr() at netbsd:ip6intr+0x71
} Sep 25 20:57:16 slave /netbsd: softint_dispatch() at 
netbsd:softint_dispatch+0xd3
} Sep 25 20:57:16 slave /netbsd: DDB lost frame for netbsd:Xsoftintr+0x4f, 
trying 0xfe8040007ff0
} Sep 25 20:57:16 slave /netbsd: Xsoftintr() at netbsd:Xsoftintr+0x4f
} 
} Are there many users of altq+ipf out there? Perhaps now is a good time to
} switch to npf...
} 
} -- 
} Paul Ripke
} "Great minds discuss ideas, average minds discuss events, small minds
}  discuss people."
} -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
>-- End of excerpt from Paul Ripke




Re: Fatal page fault in cbq_enqueue()

2017-03-09 Thread Paul Ripke
On Wed, Mar 08, 2017 at 08:53:56PM -0500, Christos Zoulas wrote:
> On Mar 9, 12:16pm, s...@stix.id.au (Paul Ripke) wrote:
> -- Subject: Re: Fatal page fault in cbq_enqueue()
> 
> | > > Index: altq_classq.h
> | > > ===
> | > > RCS file: /cvsroot/src/sys/altq/altq_classq.h,v
> | > > retrieving revision 1.7
> | > > diff -u -u -r1.7 altq_classq.h
> | > > --- altq_classq.h   12 Oct 2006 19:59:08 -  1.7
> | > > +++ altq_classq.h   27 Jan 2017 18:10:12 -
> | > > @@ -108,9 +108,9 @@
> | > >  {
> | > > struct mbuf  *m, *m0;
> | > >  
> | > > -   if ((m = qtail(q)) == NULL)
> | > > +   if ((m = qtail(q)) == NULL || (m0 = m->m_nextpkt) == NULL)
> | > > return (NULL);
> | > > -   if ((m0 = m->m_nextpkt) != m)
> | > > +   if (m0 != m)
> | > > m->m_nextpkt = m0->m_nextpkt;
> | > > else
> | > > qtail(q) = NULL;
> | > 
> | > Indeed... Well, we'll see how it goes, I'm running with that now. I've
> | > had one crash since, so a couple of weeks might be enough to have some
> | > idea.
> | 
> | Pity. Crashed elsewhere. I think there's a definite race in altq somewhere.
> 
> So it took how many days?

Booted with that patch around Jan 28. I think I've had a couple of
silent reboots since, followed by:

Mar  3 10:45:30 slave /netbsd: panic: _rmc_wrr_dequeue_next
Mar  3 10:47:42 slave /netbsd: panic: _rmc_wrr_dequeue_next
Mar  9 11:52:03 slave /netbsd: panic: _rmc_wrr_dequeue_next

That's a remarkably tight cluster of crashes.

-- 
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.


Re: Fatal page fault in cbq_enqueue()

2017-03-08 Thread Christos Zoulas
On Mar 9, 12:16pm, s...@stix.id.au (Paul Ripke) wrote:
-- Subject: Re: Fatal page fault in cbq_enqueue()

| > > Index: altq_classq.h
| > > ===
| > > RCS file: /cvsroot/src/sys/altq/altq_classq.h,v
| > > retrieving revision 1.7
| > > diff -u -u -r1.7 altq_classq.h
| > > --- altq_classq.h 12 Oct 2006 19:59:08 -  1.7
| > > +++ altq_classq.h 27 Jan 2017 18:10:12 -
| > > @@ -108,9 +108,9 @@
| > >  {
| > >   struct mbuf  *m, *m0;
| > >  
| > > - if ((m = qtail(q)) == NULL)
| > > + if ((m = qtail(q)) == NULL || (m0 = m->m_nextpkt) == NULL)
| > >   return (NULL);
| > > - if ((m0 = m->m_nextpkt) != m)
| > > + if (m0 != m)
| > >   m->m_nextpkt = m0->m_nextpkt;
| > >   else
| > >   qtail(q) = NULL;
| > 
| > Indeed... Well, we'll see how it goes, I'm running with that now. I've
| > had one crash since, so a couple of weeks might be enough to have some
| > idea.
| 
| Pity. Crashed elsewhere. I think there's a definite race in altq somewhere.

So it took how many days?

christos


Re: Fatal page fault in cbq_enqueue()

2017-03-08 Thread Paul Ripke
On Wed, Feb 08, 2017 at 11:09:55PM +1100, Paul Ripke wrote:
> On Fri, Jan 27, 2017 at 06:10:52PM +, Christos Zoulas wrote:
> > In article <20170127111545.GB9450@slave.private>,
> > Paul Ripke   wrote:
> > >Still happening, although not as often. Maybe the hot weather helps...
> > >
> > >I've got a core this time, on amd64 7.0_STABLE as of 2016-10-12.
> > 
> > How about this? At least you'll not crash
> > 
> > christos
> > 
> > Index: altq_classq.h
> > ===
> > RCS file: /cvsroot/src/sys/altq/altq_classq.h,v
> > retrieving revision 1.7
> > diff -u -u -r1.7 altq_classq.h
> > --- altq_classq.h   12 Oct 2006 19:59:08 -  1.7
> > +++ altq_classq.h   27 Jan 2017 18:10:12 -
> > @@ -108,9 +108,9 @@
> >  {
> > struct mbuf  *m, *m0;
> >  
> > -   if ((m = qtail(q)) == NULL)
> > +   if ((m = qtail(q)) == NULL || (m0 = m->m_nextpkt) == NULL)
> > return (NULL);
> > -   if ((m0 = m->m_nextpkt) != m)
> > +   if (m0 != m)
> > m->m_nextpkt = m0->m_nextpkt;
> > else
> > qtail(q) = NULL;
> 
> Indeed... Well, we'll see how it goes, I'm running with that now. I've
> had one crash since, so a couple of weeks might be enough to have some
> idea.

Pity. Crashed elsewhere. I think there's a definite race in altq somewhere.

Mar  3 10:45:30 slave /netbsd: panic: _rmc_wrr_dequeue_next
Mar  3 10:45:30 slave /netbsd: cpu0: Begin traceback...
Mar  3 10:45:30 slave /netbsd: vpanic() at netbsd:vpanic+0x13c
Mar  3 10:45:30 slave /netbsd: snprintf() at netbsd:snprintf
Mar  3 10:45:30 slave /netbsd: rmc_dequeue_next() at 
netbsd:rmc_dequeue_next+0x50f
Mar  3 10:45:30 slave /netbsd: cbq_dequeue() at netbsd:cbq_dequeue+0x27
Mar  3 10:45:30 slave /netbsd: tbr_dequeue() at netbsd:tbr_dequeue+0x72
Mar  3 10:45:30 slave /netbsd: sppp_dequeue() at netbsd:sppp_dequeue+0x111
Mar  3 10:45:30 slave /netbsd: pppoe_start() at netbsd:pppoe_start+0x30
Mar  3 10:45:30 slave /netbsd: tbr_timeout() at netbsd:tbr_timeout+0x4e
Mar  3 10:45:30 slave /netbsd: callout_softclock() at 
netbsd:callout_softclock+0x248
Mar  3 10:45:30 slave /netbsd: softint_dispatch() at 
netbsd:softint_dispatch+0x79
Mar  3 10:45:30 slave /netbsd: DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
0xfe804000bff0
Mar  3 10:45:30 slave /netbsd: Xsoftintr() at netbsd:Xsoftintr+0x4f

-- 
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.


Re: Fatal page fault in cbq_enqueue()

2017-01-27 Thread Christos Zoulas
In article <20170127111545.GB9450@slave.private>,
Paul Ripke   wrote:
>Still happening, although not as often. Maybe the hot weather helps...
>
>I've got a core this time, on amd64 7.0_STABLE as of 2016-10-12.

How about this? At least you'll not crash

christos

Index: altq_classq.h
===
RCS file: /cvsroot/src/sys/altq/altq_classq.h,v
retrieving revision 1.7
diff -u -u -r1.7 altq_classq.h
--- altq_classq.h   12 Oct 2006 19:59:08 -  1.7
+++ altq_classq.h   27 Jan 2017 18:10:12 -
@@ -108,9 +108,9 @@
 {
struct mbuf  *m, *m0;
 
-   if ((m = qtail(q)) == NULL)
+   if ((m = qtail(q)) == NULL || (m0 = m->m_nextpkt) == NULL)
return (NULL);
-   if ((m0 = m->m_nextpkt) != m)
+   if (m0 != m)
m->m_nextpkt = m0->m_nextpkt;
else
qtail(q) = NULL;



Re: Fatal page fault in cbq_enqueue()

2017-01-27 Thread Paul Ripke
Still happening, although not as often. Maybe the hot weather helps...

I've got a core this time, on amd64 7.0_STABLE as of 2016-10-12.

(gdb) bt
#0  0x80647ff5 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0) at 
/home/netbsd/netbsd-7/src/sys/arch/amd64/amd64/machdep.c:671
#1  0x808720c2 in vpanic (fmt=fmt@entry=0x80d23a12 "trap", 
ap=ap@entry=0xfe80400057a0) at 
/home/netbsd/netbsd-7/src/sys/kern/subr_prf.c:340
#2  0x8087217d in panic (fmt=fmt@entry=0x80d23a12 "trap") at 
/home/netbsd/netbsd-7/src/sys/kern/subr_prf.c:256
#3  0x808b5476 in trap (frame=0xfe80400058c0) at 
/home/netbsd/netbsd-7/src/sys/arch/amd64/amd64/trap.c:298
#4  0x80100f46 in alltraps ()
#5  0x80179806 in _getq (q=0xfe810a5d7628, q=0xfe810a5d7628) at 
/home/netbsd/netbsd-7/src/sys/altq/altq_classq.h:113
#6  _rmc_dropq (cl=) at 
/home/netbsd/netbsd-7/src/sys/altq/altq_rmclass.c:1621
#7  rmc_drop_action (cl=0xfe811e4b4c08) at 
/home/netbsd/netbsd-7/src/sys/altq/altq_rmclass.c:1434
#8  rmc_queue_packet (cl=cl@entry=0xfe811e4b4c08, 
m=m@entry=0xfe8095f79200) at 
/home/netbsd/netbsd-7/src/sys/altq/altq_rmclass.c:802
#9  0x80178327 in cbq_enqueue (ifq=0xfe8108d09120, 
m=0xfe8095f79200, pktattr=) at 
/home/netbsd/netbsd-7/src/sys/altq/altq_cbq.c:536
#10 0x803cf62c in ifq_enqueue2 (ifp=ifp@entry=0xfe8108d09008, 
ifq=ifq@entry=0x0, m=m@entry=0xfe8095f79200, 
pktattr=pktattr@entry=0xfe8040005a60) at 
/home/netbsd/netbsd-7/src/sys/net/if.c:2227
#11 0x80471689 in sppp_output (ifp=0xfe8108d09008, 
m=0xfe8095f79200, dst=, rt=) at 
/home/netbsd/netbsd-7/src/sys/net/if_spppsubr.c:882
#12 0x80690976 in nd6_output (ifp=ifp@entry=0xfe8108d09008, 
origifp=origifp@entry=0xfe8108d09008, m0=m0@entry=0xfe8095f79200, 
dst=0xfe8108be1388, rt0=) at 
/home/netbsd/netbsd-7/src/sys/netinet6/nd6.c:2335
#13 0x80550b6f in ipf_fastroute6 (mpp=0xfe8040005bf8, fdp=0x0, 
fin=0xfe8040005c00, m0=0xfe8095f79200) at 
/home/netbsd/netbsd-7/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:1444
#14 ipf_fastroute (m0=0xfe8095f79200, mpp=mpp@entry=0xfe8040005bf8, 
fin=fin@entry=0xfe8040005c00, fdp=fdp@entry=0x0) at 
/home/netbsd/netbsd-7/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:1068
#15 0x8054fb4d in ipf_send_ip (fin=fin@entry=0xfe8040005d60, 
m=m@entry=0xfe8095f79200) at 
/home/netbsd/netbsd-7/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:849
#16 0x8054fda8 in ipf_send_reset (fin=fin@entry=0xfe8040005d60) at 
/home/netbsd/netbsd-7/src/sys/external/bsd/ipf/netinet/ip_fil_netbsd.c:775
#17 0x803471af in ipf_check (ctx=0x8105b560 , 
ip=, hlen=, ifp=, out=0, 
mp=0xfe8040005eb0) at 
/home/netbsd/netbsd-7/src/sys/external/bsd/ipf/netinet/fil.c:3081
#18 0x8071d26e in pfil_run_hooks (ph=, 
mp=mp@entry=0xfe8040005ef8, ifp=0xfe8108d09008, dir=dir@entry=1) at 
/home/netbsd/netbsd-7/src/sys/net/pfil.c:266
#19 0x8053bfbc in ip6_input (m=0xfe80d9337800) at 
/home/netbsd/netbsd-7/src/sys/netinet6/ip6_input.c:350
#20 0x8053ca05 in ip6intr (arg=) at 
/home/netbsd/netbsd-7/src/sys/netinet6/ip6_input.c:238
#21 0x805e9eaa in softint_execute (l=, s=, si=) at 
/home/netbsd/netbsd-7/src/sys/kern/kern_softint.c:589
#22 softint_dispatch (pinned=, s=4) at 
/home/netbsd/netbsd-7/src/sys/kern/kern_softint.c:871
#23 0x8011402f in Xsoftintr ()
(gdb) f 5
#5  0x80179806 in _getq (q=0xfe810a5d7628, q=0xfe810a5d7628) at 
/home/netbsd/netbsd-7/src/sys/altq/altq_classq.h:113
113 if ((m0 = m->m_nextpkt) != m)
(gdb) list
108 {
109 struct mbuf  *m, *m0;
110
111 if ((m = qtail(q)) == NULL)
112 return (NULL);
113 if ((m0 = m->m_nextpkt) != m)
114 m->m_nextpkt = m0->m_nextpkt;
115 else
116 qtail(q) = NULL;
117 qlen(q)--;
(gdb) p m0
$1 = (struct mbuf *) 0x0
(gdb) p m->m_hdr
$2 = {
  mh_next = 0x0,
  mh_nextpkt = 0x0,
  mh_data = 0xfe8095f79266 "",
  mh_owner = 0x0,
  mh_len = 62,
  mh_flags = 2,
  mh_paddr = 2516029952,
  mh_type = 2
}

That's clearly not going to work. Ideas?

-- 
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.


Re: Fatal page fault in cbq_enqueue()

2016-10-16 Thread Paul Ripke
So, just wondering if anyone else is using altq, or if there's some
replacement I don't know about?

I'm running NetBSD 7.0_STABLE amd64 as of 2016-10-12, and it's
panicing a couple of times a week. Latest was at:

Oct 17 07:03:04 slave /netbsd: panic: _rmc_wrr_dequeue_next
Oct 17 07:03:04 slave /netbsd: cpu0: Begin traceback...
Oct 17 07:03:04 slave /netbsd: vpanic() at netbsd:vpanic+0x13c
Oct 17 07:03:04 slave /netbsd: snprintf() at netbsd:snprintf
Oct 17 07:03:04 slave /netbsd: rmc_dequeue_next() at 
netbsd:rmc_dequeue_next+0x4fd
Oct 17 07:03:04 slave /netbsd: cbq_dequeue() at netbsd:cbq_dequeue+0x27
Oct 17 07:03:04 slave /netbsd: tbr_dequeue() at netbsd:tbr_dequeue+0x72
Oct 17 07:03:04 slave /netbsd: sppp_dequeue() at netbsd:sppp_dequeue+0x111
Oct 17 07:03:04 slave /netbsd: pppoe_start() at netbsd:pppoe_start+0x30
Oct 17 07:03:04 slave /netbsd: sppp_output() at netbsd:sppp_output+0x440
Oct 17 07:03:04 slave /netbsd: ip_output() at netbsd:ip_output+0xc4e
Oct 17 07:03:04 slave /netbsd: tcp_output() at netbsd:tcp_output+0x14ac
Oct 17 07:03:04 slave /netbsd: tcp_input() at netbsd:tcp_input+0xd67
Oct 17 07:03:04 slave /netbsd: ipintr() at netbsd:ipintr+0x81b
Oct 17 07:03:04 slave /netbsd: softint_dispatch() at 
netbsd:softint_dispatch+0x79
Oct 17 07:03:04 slave /netbsd: DDB lost frame for netbsd:Xsoftintr+0x4f, trying 
0xfe804
0005ff0
Oct 17 07:03:04 slave /netbsd: Xsoftintr() at netbsd:Xsoftintr+0x4f
Oct 17 07:03:04 slave /netbsd: --- interrupt ---
Oct 17 07:03:04 slave /netbsd: 0:
Oct 17 07:03:04 slave /netbsd: cpu0: End traceback...

-- 
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
 discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.