Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Mon, May 16, 2022 at 05:06:28PM +0200, Claudio Jeker wrote: > > In veb configuration we are holding the netlock and sleep in > > smr_barrier() and refcnt_finalize(). An additional sleep in malloc() > > is fine here. > > Are you sure about this? smr_barrier() on busy systems with many cpus can > take 100ms and more. Isn't all processing stopped during that time (IIRC > the code uses the write exclusive netlock). Interface ioctls are called with netlock. veb_ioctl() calls veb_add_port() and veb_del_port() which call smr_barrier(). I did not notice 100ms network sleep during ifconfig. I would not consider it an urgent problem. If you optimize for one thing, you pay the price somewhere else. > Sleeping with a global rwlock held is not a good design. And here we are back to the net lock design discussion we had with mpi@ years ago. More important is progress. We should not spend too much time on design discussion. Better solve problems. My plan is, in more or less this order: - use exclusive netlock for configuration - use shared netlock for parts that are MP safe - kernel lock together with net lock allows quick fixes to make progress without making everything MP perfect - replace kernel lock with mutex - use something smarter than mutex to make things fast Others have other plans, so we have different maturity of MP support in the kernel. bluhm
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Sat, May 14, 2022 at 12:41:00AM +0200, Alexander Bluhm wrote: > On Fri, May 13, 2022 at 05:53:27PM +0200, Alexandr Nedvedicky wrote: > > at this point we hold a NET_LOCK(). So basically if there won't > > be enough memory we might start sleeping waiting for memory > > while we will be holding a NET_LOCK. > > > > This is something we should try to avoid, however this can be > > sorted out later. At this point I just want to point out > > this problem, which can be certainly solved in follow up > > commit. pf(4) also has its homework to be done around > > sleeping mallocs. > > I think sleeping with netlock is not a problem in general. > > In pf(4) ioctl we sleep with netlock and pflock while doing copyin() > or copyout(). This results in a lock order reversal due to a hack > in uvn_io(). In my opinion we should not sleep within pf lock, so > we can convert it to mutex or someting better later. > > In veb configuration we are holding the netlock and sleep in > smr_barrier() and refcnt_finalize(). An additional sleep in malloc() > is fine here. Are you sure about this? smr_barrier() on busy systems with many cpus can take 100ms and more. Isn't all processing stopped during that time (IIRC the code uses the write exclusive netlock). Also for smr_barrier() the ownership of the object should have already been moved exclusivly to the thread and so it can call smr_barrier() without holding any locks. I feel the same is true for refcnt_finalize(). Sleeping with a global rwlock held is not a good design. > Holding the netlock and sleeping in m_get() is worse. There is no > recovery after reaching the mbuf limit. Sleeping rules are > inconsistent and depend on the area of the stack. Different people > have multiple ideas how it should be done. -- :wq Claudio
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Fri, May 13, 2022 at 05:53:27PM +0200, Alexandr Nedvedicky wrote: > at this point we hold a NET_LOCK(). So basically if there won't > be enough memory we might start sleeping waiting for memory > while we will be holding a NET_LOCK. > > This is something we should try to avoid, however this can be > sorted out later. At this point I just want to point out > this problem, which can be certainly solved in follow up > commit. pf(4) also has its homework to be done around > sleeping mallocs. I think sleeping with netlock is not a problem in general. In pf(4) ioctl we sleep with netlock and pflock while doing copyin() or copyout(). This results in a lock order reversal due to a hack in uvn_io(). In my opinion we should not sleep within pf lock, so we can convert it to mutex or someting better later. In veb configuration we are holding the netlock and sleep in smr_barrier() and refcnt_finalize(). An additional sleep in malloc() is fine here. Holding the netlock and sleeping in m_get() is worse. There is no recovery after reaching the mbuf limit. Sleeping rules are inconsistent and depend on the area of the stack. Different people have multiple ideas how it should be done. bluhm
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Fri, May 13, 2022 at 12:19:46PM +1000, David Gwynne wrote: > sorry i'm late to the party. can you try this diff? Thanks for having a look. I added veb(4) to my setup. With this diff, I cannot trigger a crash anymore. OK bluhm@ > this diff replaces the list of ports with an array/map of ports. > the map takes references to all the ports, so the forwarding paths > just have to hold a reference to the map to be able to use all the > ports. the forwarding path uses smr to get hold of a map, takes a map > ref, and then leaves the smr crit section before iterating over the map > and pushing packets. > > this means we should only take and release a single refcnt when > we're pushing packets out any number of ports. > > if no span ports are configured, then there's no span port map and > we don't try and take a ref, we can just return early. > > we also only take and release a single refcnt when we forward the > actual packet. forwarding to a single port provided by an etherbridge > lookup already takes/releases the single port ref. if it falls > through that for unknown unicast or broadcast/multicast, then it's > a single refcnt for the current map of all ports. > > Index: if_veb.c > === > RCS file: /cvs/src/sys/net/if_veb.c,v > retrieving revision 1.25 > diff -u -p -r1.25 if_veb.c > --- if_veb.c 4 Jan 2022 06:32:39 - 1.25 > +++ if_veb.c 13 May 2022 02:01:43 - > @@ -139,13 +139,13 @@ struct veb_port { > struct veb_rule_list p_vr_list[2]; > #define VEB_RULE_LIST_OUT0 > #define VEB_RULE_LIST_IN 1 > - > - SMR_TAILQ_ENTRY(veb_port)p_entry; > }; > > struct veb_ports { > - SMR_TAILQ_HEAD(, veb_port) l_list; > - unsigned int l_count; > + struct refcntm_refs; > + unsigned int m_count; > + > + /* followed by an array of veb_port pointers */ > }; > > struct veb_softc { > @@ -155,8 +155,8 @@ struct veb_softc { > struct etherbridge sc_eb; > > struct rwlocksc_rule_lock; > - struct veb_ports sc_ports; > - struct veb_ports sc_spans; > + struct veb_ports*sc_ports; > + struct veb_ports*sc_spans; > }; > > #define DPRINTF(_sc, fmt...)do { \ > @@ -184,8 +184,25 @@ static int veb_p_ioctl(struct ifnet *, u > static int veb_p_output(struct ifnet *, struct mbuf *, > struct sockaddr *, struct rtentry *); > > -static void veb_p_dtor(struct veb_softc *, struct veb_port *, > - const char *); > +static inline size_t > +veb_ports_size(unsigned int n) > +{ > + /* use of _ALIGN is inspired by CMSGs */ > + return _ALIGN(sizeof(struct veb_ports)) + > + n * sizeof(struct veb_port *); > +} > + > +static inline struct veb_port ** > +veb_ports_array(struct veb_ports *m) > +{ > + return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m))); > +} > + > +static void veb_ports_free(struct veb_ports *); > + > +static void veb_p_unlink(struct veb_softc *, struct veb_port *); > +static void veb_p_fini(struct veb_port *); > +static void veb_p_dtor(struct veb_softc *, struct veb_port *); > static int veb_add_port(struct veb_softc *, > const struct ifbreq *, unsigned int); > static int veb_del_port(struct veb_softc *, > @@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i > return (ENOMEM); > > rw_init(>sc_rule_lock, "vebrlk"); > - SMR_TAILQ_INIT(>sc_ports.l_list); > - SMR_TAILQ_INIT(>sc_spans.l_list); > + sc->sc_ports = NULL; > + sc->sc_spans = NULL; > > ifp = >sc_if; > > @@ -314,7 +331,10 @@ static int > veb_clone_destroy(struct ifnet *ifp) > { > struct veb_softc *sc = ifp->if_softc; > - struct veb_port *p, *np; > + struct veb_ports *mp, *ms; > + struct veb_port **ps; > + struct veb_port *p; > + unsigned int i; > > NET_LOCK(); > sc->sc_dead = 1; > @@ -326,10 +346,60 @@ veb_clone_destroy(struct ifnet *ifp) > if_detach(ifp); > > NET_LOCK(); > - SMR_TAILQ_FOREACH_SAFE_LOCKED(p, >sc_ports.l_list, p_entry, np) > - veb_p_dtor(sc, p, "destroy"); > - SMR_TAILQ_FOREACH_SAFE_LOCKED(p, >sc_spans.l_list, p_entry, np) > - veb_p_dtor(sc, p, "destroy"); > + > + /* > + * this is an upside down version of veb_p_dtor() and > + * veb_ports_destroy() to avoid a lot of malloc/free and > + * smr_barrier calls if we remove ports one by one. > + */ > + > + mp = SMR_PTR_GET_LOCKED(>sc_ports); > + SMR_PTR_SET_LOCKED(>sc_ports, NULL); > + if (mp != NULL) { > + ps = veb_ports_array(mp); > + for (i = 0; i < mp->m_count; i++) { > + veb_p_unlink(sc, ps[i]); > + } > + } > +
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
Hello Dave, > > sorry i'm late to the party. can you try this diff? glad to see you are here. I think you diff looks good. I'm just concerned about the memory allocation in veb_ports_insert(). The memory is allocated with `M_WAITOK` flag, which essentially means we may give up CPU. the veb_ports_insert() is being called from veb_add_port() here: 1500 } 1501 1502 p->p_brport.eb_port_take = veb_eb_brport_take; 1503 p->p_brport.eb_port_rele = veb_eb_brport_rele; 1504 1505 om = SMR_PTR_GET_LOCKED(ports_ptr); 1506 nm = veb_ports_insert(om, p); 1507 1508 /* this might have changed if we slept for malloc or ifpromisc */ 1509 error = ether_brport_isset(ifp0); at this point we hold a NET_LOCK(). So basically if there won't be enough memory we might start sleeping waiting for memory while we will be holding a NET_LOCK. This is something we should try to avoid, however this can be sorted out later. At this point I just want to point out this problem, which can be certainly solved in follow up commit. pf(4) also has its homework to be done around sleeping mallocs. I think your diff should go in as is. thanks and regards OK sashan
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Fri, May 13, 2022 at 12:19:46PM +1000, David Gwynne wrote: > On Thu, May 12, 2022 at 08:07:09PM +0200, Hrvoje Popovski wrote: > > On 12.5.2022. 20:04, Hrvoje Popovski wrote: > > > On 12.5.2022. 16:22, Hrvoje Popovski wrote: > > >> On 12.5.2022. 14:48, Claudio Jeker wrote: > > >>> I think the diff below may be enough to fix this issue. It drops the SMR > > >>> critical secition around the enqueue operation but uses a reference on > > >>> the > > >>> port insteadt to ensure that the device can't be removed during the > > >>> enqueue. Once the enqueue is finished we enter the SMR critical section > > >>> again and drop the reference. > > >>> > > >>> To make it clear that the SMR_TAILQ remains intact while a refcount is > > >>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is > > >>> not strictly needed since the next pointer remains valid up until the > > >>> smr_barrier() call but I find this a bit easier to understand. > > >>> First make sure nobody else holds a reference then remove the port from > > >>> the list. > > >>> > > >>> I currently have no test setup to verify this but I hope someone else > > >>> can > > >>> give this a spin. > > >> Hi, > > >> > > >> for now veb seems stable and i can't panic box although it should, but > > >> please give me few more hours to torture it properly. > > > > > > > > > I can trigger panic in veb with this diff. > > > > > > Thank you .. > > > > > > > > > > I can't trigger ... :))) sorry .. > > sorry i'm late to the party. can you try this diff? > > this diff replaces the list of ports with an array/map of ports. > the map takes references to all the ports, so the forwarding paths > just have to hold a reference to the map to be able to use all the > ports. the forwarding path uses smr to get hold of a map, takes a map > ref, and then leaves the smr crit section before iterating over the map > and pushing packets. > > this means we should only take and release a single refcnt when > we're pushing packets out any number of ports. > > if no span ports are configured, then there's no span port map and > we don't try and take a ref, we can just return early. > > we also only take and release a single refcnt when we forward the > actual packet. forwarding to a single port provided by an etherbridge > lookup already takes/releases the single port ref. if it falls > through that for unknown unicast or broadcast/multicast, then it's > a single refcnt for the current map of all ports. This is very smart and probably better since it uses less atomic instructions per packet. Diff reads fine. OK claudio@ > Index: if_veb.c > === > RCS file: /cvs/src/sys/net/if_veb.c,v > retrieving revision 1.25 > diff -u -p -r1.25 if_veb.c > --- if_veb.c 4 Jan 2022 06:32:39 - 1.25 > +++ if_veb.c 13 May 2022 02:01:43 - > @@ -139,13 +139,13 @@ struct veb_port { > struct veb_rule_list p_vr_list[2]; > #define VEB_RULE_LIST_OUT0 > #define VEB_RULE_LIST_IN 1 > - > - SMR_TAILQ_ENTRY(veb_port)p_entry; > }; > > struct veb_ports { > - SMR_TAILQ_HEAD(, veb_port) l_list; > - unsigned int l_count; > + struct refcntm_refs; > + unsigned int m_count; > + > + /* followed by an array of veb_port pointers */ > }; > > struct veb_softc { > @@ -155,8 +155,8 @@ struct veb_softc { > struct etherbridge sc_eb; > > struct rwlocksc_rule_lock; > - struct veb_ports sc_ports; > - struct veb_ports sc_spans; > + struct veb_ports*sc_ports; > + struct veb_ports*sc_spans; > }; > > #define DPRINTF(_sc, fmt...)do { \ > @@ -184,8 +184,25 @@ static int veb_p_ioctl(struct ifnet *, u > static int veb_p_output(struct ifnet *, struct mbuf *, > struct sockaddr *, struct rtentry *); > > -static void veb_p_dtor(struct veb_softc *, struct veb_port *, > - const char *); > +static inline size_t > +veb_ports_size(unsigned int n) > +{ > + /* use of _ALIGN is inspired by CMSGs */ > + return _ALIGN(sizeof(struct veb_ports)) + > + n * sizeof(struct veb_port *); > +} > + > +static inline struct veb_port ** > +veb_ports_array(struct veb_ports *m) > +{ > + return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m))); > +} > + > +static void veb_ports_free(struct veb_ports *); > + > +static void veb_p_unlink(struct veb_softc *, struct veb_port *); > +static void veb_p_fini(struct veb_port *); > +static void veb_p_dtor(struct veb_softc *, struct veb_port *); > static int veb_add_port(struct veb_softc *, > const struct ifbreq *, unsigned int); > static int veb_del_port(struct veb_softc *, > @@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 13.5.2022. 4:19, David Gwynne wrote: > sorry i'm late to the party. can you try this diff? > > this diff replaces the list of ports with an array/map of ports. > the map takes references to all the ports, so the forwarding paths > just have to hold a reference to the map to be able to use all the > ports. the forwarding path uses smr to get hold of a map, takes a map > ref, and then leaves the smr crit section before iterating over the map > and pushing packets. > > this means we should only take and release a single refcnt when > we're pushing packets out any number of ports. > > if no span ports are configured, then there's no span port map and > we don't try and take a ref, we can just return early. > > we also only take and release a single refcnt when we forward the > actual packet. forwarding to a single port provided by an etherbridge > lookup already takes/releases the single port ref. if it falls > through that for unknown unicast or broadcast/multicast, then it's > a single refcnt for the current map of all ports. Hi, and with this diff i can't trigger panic ...
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Thu, May 12, 2022 at 08:07:09PM +0200, Hrvoje Popovski wrote: > On 12.5.2022. 20:04, Hrvoje Popovski wrote: > > On 12.5.2022. 16:22, Hrvoje Popovski wrote: > >> On 12.5.2022. 14:48, Claudio Jeker wrote: > >>> I think the diff below may be enough to fix this issue. It drops the SMR > >>> critical secition around the enqueue operation but uses a reference on the > >>> port insteadt to ensure that the device can't be removed during the > >>> enqueue. Once the enqueue is finished we enter the SMR critical section > >>> again and drop the reference. > >>> > >>> To make it clear that the SMR_TAILQ remains intact while a refcount is > >>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is > >>> not strictly needed since the next pointer remains valid up until the > >>> smr_barrier() call but I find this a bit easier to understand. > >>> First make sure nobody else holds a reference then remove the port from > >>> the list. > >>> > >>> I currently have no test setup to verify this but I hope someone else can > >>> give this a spin. > >> Hi, > >> > >> for now veb seems stable and i can't panic box although it should, but > >> please give me few more hours to torture it properly. > > > > > > I can trigger panic in veb with this diff. > > > > Thank you .. > > > > > > I can't trigger ... :))) sorry .. sorry i'm late to the party. can you try this diff? this diff replaces the list of ports with an array/map of ports. the map takes references to all the ports, so the forwarding paths just have to hold a reference to the map to be able to use all the ports. the forwarding path uses smr to get hold of a map, takes a map ref, and then leaves the smr crit section before iterating over the map and pushing packets. this means we should only take and release a single refcnt when we're pushing packets out any number of ports. if no span ports are configured, then there's no span port map and we don't try and take a ref, we can just return early. we also only take and release a single refcnt when we forward the actual packet. forwarding to a single port provided by an etherbridge lookup already takes/releases the single port ref. if it falls through that for unknown unicast or broadcast/multicast, then it's a single refcnt for the current map of all ports. Index: if_veb.c === RCS file: /cvs/src/sys/net/if_veb.c,v retrieving revision 1.25 diff -u -p -r1.25 if_veb.c --- if_veb.c4 Jan 2022 06:32:39 - 1.25 +++ if_veb.c13 May 2022 02:01:43 - @@ -139,13 +139,13 @@ struct veb_port { struct veb_rule_list p_vr_list[2]; #define VEB_RULE_LIST_OUT 0 #define VEB_RULE_LIST_IN 1 - - SMR_TAILQ_ENTRY(veb_port)p_entry; }; struct veb_ports { - SMR_TAILQ_HEAD(, veb_port) l_list; - unsigned int l_count; + struct refcntm_refs; + unsigned int m_count; + + /* followed by an array of veb_port pointers */ }; struct veb_softc { @@ -155,8 +155,8 @@ struct veb_softc { struct etherbridge sc_eb; struct rwlocksc_rule_lock; - struct veb_ports sc_ports; - struct veb_ports sc_spans; + struct veb_ports*sc_ports; + struct veb_ports*sc_spans; }; #define DPRINTF(_sc, fmt...)do { \ @@ -184,8 +184,25 @@ static int veb_p_ioctl(struct ifnet *, u static int veb_p_output(struct ifnet *, struct mbuf *, struct sockaddr *, struct rtentry *); -static voidveb_p_dtor(struct veb_softc *, struct veb_port *, - const char *); +static inline size_t +veb_ports_size(unsigned int n) +{ + /* use of _ALIGN is inspired by CMSGs */ + return _ALIGN(sizeof(struct veb_ports)) + + n * sizeof(struct veb_port *); +} + +static inline struct veb_port ** +veb_ports_array(struct veb_ports *m) +{ + return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m))); +} + +static voidveb_ports_free(struct veb_ports *); + +static voidveb_p_unlink(struct veb_softc *, struct veb_port *); +static voidveb_p_fini(struct veb_port *); +static voidveb_p_dtor(struct veb_softc *, struct veb_port *); static int veb_add_port(struct veb_softc *, const struct ifbreq *, unsigned int); static int veb_del_port(struct veb_softc *, @@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i return (ENOMEM); rw_init(>sc_rule_lock, "vebrlk"); - SMR_TAILQ_INIT(>sc_ports.l_list); - SMR_TAILQ_INIT(>sc_spans.l_list); + sc->sc_ports = NULL; + sc->sc_spans = NULL; ifp = >sc_if; @@ -314,7 +331,10 @@ static int veb_clone_destroy(struct ifnet *ifp) { struct veb_softc *sc = ifp->if_softc; - struct
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 12.5.2022. 20:04, Hrvoje Popovski wrote: > On 12.5.2022. 16:22, Hrvoje Popovski wrote: >> On 12.5.2022. 14:48, Claudio Jeker wrote: >>> I think the diff below may be enough to fix this issue. It drops the SMR >>> critical secition around the enqueue operation but uses a reference on the >>> port insteadt to ensure that the device can't be removed during the >>> enqueue. Once the enqueue is finished we enter the SMR critical section >>> again and drop the reference. >>> >>> To make it clear that the SMR_TAILQ remains intact while a refcount is >>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is >>> not strictly needed since the next pointer remains valid up until the >>> smr_barrier() call but I find this a bit easier to understand. >>> First make sure nobody else holds a reference then remove the port from >>> the list. >>> >>> I currently have no test setup to verify this but I hope someone else can >>> give this a spin. >> Hi, >> >> for now veb seems stable and i can't panic box although it should, but >> please give me few more hours to torture it properly. > > > I can trigger panic in veb with this diff. > > Thank you .. > > I can't trigger ... :))) sorry ..
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 12.5.2022. 16:22, Hrvoje Popovski wrote: > On 12.5.2022. 14:48, Claudio Jeker wrote: >> I think the diff below may be enough to fix this issue. It drops the SMR >> critical secition around the enqueue operation but uses a reference on the >> port insteadt to ensure that the device can't be removed during the >> enqueue. Once the enqueue is finished we enter the SMR critical section >> again and drop the reference. >> >> To make it clear that the SMR_TAILQ remains intact while a refcount is >> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is >> not strictly needed since the next pointer remains valid up until the >> smr_barrier() call but I find this a bit easier to understand. >> First make sure nobody else holds a reference then remove the port from >> the list. >> >> I currently have no test setup to verify this but I hope someone else can >> give this a spin. > Hi, > > for now veb seems stable and i can't panic box although it should, but > please give me few more hours to torture it properly. I can trigger panic in veb with this diff. Thank you ..
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
Hello, > > I think the diff below may be enough to fix this issue. It drops the SMR > critical secition around the enqueue operation but uses a reference on the > port insteadt to ensure that the device can't be removed during the > enqueue. Once the enqueue is finished we enter the SMR critical section > again and drop the reference. > > To make it clear that the SMR_TAILQ remains intact while a refcount is > held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is > not strictly needed since the next pointer remains valid up until the > smr_barrier() call but I find this a bit easier to understand. > First make sure nobody else holds a reference then remove the port from > the list. > > I currently have no test setup to verify this but I hope someone else can > give this a spin. I agree this diff is far better than mine version. I have not realized we can actually do a dance with smr_read_enter()/smr_read_leave(). I believe it should work and fix a problem. Here is my OK (if my OK counts here). OK sashan > -- > :wq Claudio > > Index: if_veb.c > === > RCS file: /cvs/src/sys/net/if_veb.c,v > retrieving revision 1.25 > diff -u -p -r1.25 if_veb.c > --- if_veb.c 4 Jan 2022 06:32:39 - 1.25 > +++ if_veb.c 12 May 2022 12:06:28 - > @@ -365,7 +365,11 @@ veb_span(struct veb_softc *sc, struct mb > continue; > } > > + veb_eb_brport_take(p); > + smr_read_leave(); > if_enqueue(ifp0, m); /* XXX count error */ > + smr_read_enter(); > + veb_eb_brport_rele(p); > } > smr_read_leave(); > } > @@ -904,7 +908,12 @@ veb_broadcast(struct veb_softc *sc, stru > continue; > } > > + > + veb_eb_brport_take(tp); > + smr_read_leave(); > (*tp->p_enqueue)(ifp0, m); /* XXX count error */ > + smr_read_enter(); > + veb_eb_brport_rele(tp); > } > smr_read_leave(); > > @@ -1934,10 +1943,11 @@ veb_p_dtor(struct veb_softc *sc, struct > > port_list = >sc_ports; > } > + refcnt_finalize(>p_refs, "vebpdtor"); > + > SMR_TAILQ_REMOVE_LOCKED(_list->l_list, p, p_entry); > port_list->l_count--; > > - refcnt_finalize(>p_refs, "vebpdtor"); > smr_barrier(); > > veb_rule_list_free(TAILQ_FIRST(>p_vrl));
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 12.5.2022. 14:48, Claudio Jeker wrote: > I think the diff below may be enough to fix this issue. It drops the SMR > critical secition around the enqueue operation but uses a reference on the > port insteadt to ensure that the device can't be removed during the > enqueue. Once the enqueue is finished we enter the SMR critical section > again and drop the reference. > > To make it clear that the SMR_TAILQ remains intact while a refcount is > held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is > not strictly needed since the next pointer remains valid up until the > smr_barrier() call but I find this a bit easier to understand. > First make sure nobody else holds a reference then remove the port from > the list. > > I currently have no test setup to verify this but I hope someone else can > give this a spin. Hi, for now veb seems stable and i can't panic box although it should, but please give me few more hours to torture it properly. I'm doing this in loop ifconfig veb1 destroy sh /etc/netstart ifconfig veb0 destroy sh /etc/netstart ifconfig vport1 destroy sh /etc/netstart ifconfig vport0 destroy sh /etc/netstart my config veb1: flags=a843 index 25 llprio 3 groups: veb ix1 flags=3 port 2 ifpriority 0 ifcost 0 vport1 flags=3 port 27 ifpriority 0 ifcost 0 veb0: flags=a843 index 26 llprio 3 groups: veb ix0 flags=3 port 1 ifpriority 0 ifcost 0 vport0 flags=3 port 28 ifpriority 0 ifcost 0 ix2 flags=100 vport1: flags=8943 mtu 1500 lladdr ec:f4:bb:da:f7:fa index 27 priority 0 llprio 3 groups: vport inet 192.168.111.11 netmask 0xff00 broadcast 192.168.111.255 vport0: flags=8943 mtu 1500 lladdr ec:f4:bb:da:f7:f8 index 28 priority 0 llprio 3 groups: vport inet 192.168.100.11 netmask 0xff00 broadcast 192.168.100.255
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Wed, May 11, 2022 at 11:01:21AM +0200, Alexandr Nedvedicky wrote: > Hello Hrvoje, > > thank you for testing. > On Wed, May 11, 2022 at 10:40:28AM +0200, Hrvoje Popovski wrote: > > On 10.5.2022. 22:55, Alexander Bluhm wrote: > > > Yes. It is similar. > > > > > > I have read the whole mail thread and the final fix got commited. > > > But it looks incomplete, pf is still sleeping. > > > > > > Hrvoje, can you run the tests again that triggered the panics a > > > year ago? > > > > Hi, > > > > year ago panics was triggered when veb or tpmr bridged traffic. I've > > tried that right now and I couldn't trigger that panics. > > If I put vport and route traffic over veb than I can trigger panic with > > or without vlans as child-iface for veb. > > For panic i need to have pf enabled and to run > > ifconfig veb destroy or ifconfig vlan destroy and sh netstart in loop. > > > > > > this is panic without vlans > > > > ddb{4}> show all locks > > CPU 4: > > exclusive sched_lock _lock r = 0 (0x824819c0) > > #0 witness_lock+0x311 > > #1 sleep_setup+0xa5 > > #2 rw_enter+0x211 > > #3 pf_test+0xcf0 > > #4 ip_output+0x6b7 > > #5 ip_forward+0x2da > > #6 ip_input_if+0x353 > > #7 ipv4_input+0x39 > > #8 ether_input+0x3ad > > #9 vport_if_enqueue+0x19 > > #10 veb_port_input+0x529 > > I suspect we deal with broadcast and this time things start > to go downhill here in veb_broadcast(): > > 875 smr_read_enter(); > 876 SMR_TAILQ_FOREACH(tp, >sc_ports.l_list, p_entry) { > 877 if (rp == tp || (rp->p_protected & tp->p_protected)) { > 878 /* > 879 * don't let Ethernet packets hairpin or > 880 * move between ports in the same protected > 881 * domain(s). > 882 */ > 883 continue; > 884 } > 885 > 886 ifp0 = tp->p_ifp0; > 887 if (!ISSET(ifp0->if_flags, IFF_RUNNING)) { > 888 /* don't waste time */ > 889 continue; > 890 } > 891 > 892 if (!ISSET(tp->p_bif_flags, IFBIF_DISCOVER) && > 893 !ISSET(m0->m_flags, M_BCAST | M_MCAST)) { > 894 /* don't flood unknown unicast */ > 895 continue; > 896 } > 897 > 898 if (veb_rule_filter(tp, VEB_RULE_LIST_OUT, m0, src, dst)) > 899 continue; > 900 > 901 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT); > 902 if (m == NULL) { > 903 /* XXX count error? */ > 904 continue; > 905 } > 906 > 907 (*tp->p_enqueue)(ifp0, m); /* XXX count error */ > 908 } > 909 smr_read_leave(); > > the veb_broadcast() is being called from veb_port_input(): > 1082 > 1083 /* unknown unicast address */ > 1084 } else { > 1085 SET(m->m_flags, ETH64_IS_BROADCAST(dst) ? M_BCAST : > M_MCAST); > 1086 } > 1087 > 1088 veb_broadcast(sc, p, m, src, dst); > 1089 return (NULL); > > I think I've spotted the place yesterday night in code, but could not > figure out how to wire things up to trigger the panic. Fortunately you've > managed to figure it out. > > I think vport_if_enqueue() is a trap/land mine here. The function should > really > be queueing a packet instead of dispatching it right away. > I think the diff below may be enough to fix this issue. It drops the SMR critical secition around the enqueue operation but uses a reference on the port insteadt to ensure that the device can't be removed during the enqueue. Once the enqueue is finished we enter the SMR critical section again and drop the reference. To make it clear that the SMR_TAILQ remains intact while a refcount is held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is not strictly needed since the next pointer remains valid up until the smr_barrier() call but I find this a bit easier to understand. First make sure nobody else holds a reference then remove the port from the list. I currently have no test setup to verify this but I hope someone else can give this a spin. -- :wq Claudio Index: if_veb.c === RCS file: /cvs/src/sys/net/if_veb.c,v retrieving revision 1.25 diff -u -p -r1.25 if_veb.c --- if_veb.c4 Jan 2022 06:32:39 - 1.25 +++ if_veb.c12 May 2022 12:06:28 - @@ -365,7 +365,11 @@ veb_span(struct veb_softc *sc, struct mb continue; } + veb_eb_brport_take(p); + smr_read_leave(); if_enqueue(ifp0, m); /* XXX count error */ + smr_read_enter(); +
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
Hello Hrvoje, thank you for testing. On Wed, May 11, 2022 at 10:40:28AM +0200, Hrvoje Popovski wrote: > On 10.5.2022. 22:55, Alexander Bluhm wrote: > > Yes. It is similar. > > > > I have read the whole mail thread and the final fix got commited. > > But it looks incomplete, pf is still sleeping. > > > > Hrvoje, can you run the tests again that triggered the panics a > > year ago? > > Hi, > > year ago panics was triggered when veb or tpmr bridged traffic. I've > tried that right now and I couldn't trigger that panics. > If I put vport and route traffic over veb than I can trigger panic with > or without vlans as child-iface for veb. > For panic i need to have pf enabled and to run > ifconfig veb destroy or ifconfig vlan destroy and sh netstart in loop. > > > this is panic without vlans > > ddb{4}> show all locks > CPU 4: > exclusive sched_lock _lock r = 0 (0x824819c0) > #0 witness_lock+0x311 > #1 sleep_setup+0xa5 > #2 rw_enter+0x211 > #3 pf_test+0xcf0 > #4 ip_output+0x6b7 > #5 ip_forward+0x2da > #6 ip_input_if+0x353 > #7 ipv4_input+0x39 > #8 ether_input+0x3ad > #9 vport_if_enqueue+0x19 > #10 veb_port_input+0x529 I suspect we deal with broadcast and this time things start to go downhill here in veb_broadcast(): 875 smr_read_enter(); 876 SMR_TAILQ_FOREACH(tp, >sc_ports.l_list, p_entry) { 877 if (rp == tp || (rp->p_protected & tp->p_protected)) { 878 /* 879 * don't let Ethernet packets hairpin or 880 * move between ports in the same protected 881 * domain(s). 882 */ 883 continue; 884 } 885 886 ifp0 = tp->p_ifp0; 887 if (!ISSET(ifp0->if_flags, IFF_RUNNING)) { 888 /* don't waste time */ 889 continue; 890 } 891 892 if (!ISSET(tp->p_bif_flags, IFBIF_DISCOVER) && 893 !ISSET(m0->m_flags, M_BCAST | M_MCAST)) { 894 /* don't flood unknown unicast */ 895 continue; 896 } 897 898 if (veb_rule_filter(tp, VEB_RULE_LIST_OUT, m0, src, dst)) 899 continue; 900 901 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT); 902 if (m == NULL) { 903 /* XXX count error? */ 904 continue; 905 } 906 907 (*tp->p_enqueue)(ifp0, m); /* XXX count error */ 908 } 909 smr_read_leave(); the veb_broadcast() is being called from veb_port_input(): 1082 1083 /* unknown unicast address */ 1084 } else { 1085 SET(m->m_flags, ETH64_IS_BROADCAST(dst) ? M_BCAST : M_MCAST); 1086 } 1087 1088 veb_broadcast(sc, p, m, src, dst); 1089 return (NULL); I think I've spotted the place yesterday night in code, but could not figure out how to wire things up to trigger the panic. Fortunately you've managed to figure it out. I think vport_if_enqueue() is a trap/land mine here. The function should really be queueing a packet instead of dispatching it right away. thanks and regards sashan
Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface
On Wed, May 11, 2022 at 10:29:57AM +0200, Claudio Jeker wrote: > On Wed, May 11, 2022 at 09:58:09AM +0200, Alexandr Nedvedicky wrote: > > Hello, > > > > > > > > Can we limit the number of span ports per bridge to a small number so that > > > the instead of a heap object for the SLIST a simple stack array of > > > MAX_SPAN_PORTS pointers could be used? > > > > > > Who needs more than a handfull of spanports per veb? > > > > I just to make sure I follow your idea... > > > > > > > > > 8<---8<---8<--8< > > > > diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c > > > > index 2976cc200f1..a02dbac887f 100644 > > > > > > + struct veb_span_port *sp; > > > > + SLIST_HEAD(, veb_span_port) span_list; > > > > so the span_list, will be turned to array of references, right? > > > > > > > > > > + SLIST_INIT(_list) > > > > smr_read_enter(); > > > > SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) { > > > > ifp0 = p->p_ifp0; > > > > if (!ISSET(ifp0->if_flags, IFF_RUNNING)) > > > > continue; > > > > > > > > - m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT); > > > > - if (m == NULL) { > > > > - /* XXX count error */ > > > > - continue; > > > > - } > > > > + sp = pool_get(_port_pool, PR_NOWAIT); > > > > + if (sp == NULL) > > > > + continue; /* XXX count error */ > > > > > > > > - if_enqueue(ifp0, m); /* XXX count error */ > > > > + veb_eb_brport_take(p); > > > > + sp->sp_port = p; > > > > + SLIST_INSERT_HEAD(_list, sp, sp_entry); > > > > here we do instead of SLIST_INSERT_HEAD() something like: > > span_list[i++] = p; > > > > > > } > > > > smr_read_leave(); > > > > + > > > > + while (!SLIST_EMPTY(_list)) { > > > > and here instead of walking list we just walk through array. > > Yes, this is my basic idea. > > > is that somewhat correct? If so than I need some 'qualified' suggestion > > for upper limit of span ports. I'm afraid my real life experience > > with network switches is zero, so I don't feel like a right person. > > I would suggest an upper limit of 8. For example it seems Cisco only > supports 2 span sessions so 8 should be plenty. Now looking at the original panic again I realized that the problem is probably not veb_span() but actually veb_broadcast(). Both functions suffer from a similar issue that the smr block covers too much. For veb_broadcast the trick with the limit will not work. Using a service task just for broadcast is also not a good option since it reorders packets. > > another idea I'm toying with is to add a 'service task' to every switch > > so the underlying `vport_if_enqueue()` will put packet to queue and > > let task to dispatch to ifp associated with vport. Once packet > > will be dispatched the task will drop the reference to port. > > Need to look more into this but in general I dislike adding more queues > and tasks. Such a setup would just synchronize everything back to a single > thread. -- :wq Claudio
Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 10.5.2022. 22:55, Alexander Bluhm wrote: > Yes. It is similar. > > I have read the whole mail thread and the final fix got commited. > But it looks incomplete, pf is still sleeping. > > Hrvoje, can you run the tests again that triggered the panics a > year ago? Hi, year ago panics was triggered when veb or tpmr bridged traffic. I've tried that right now and I couldn't trigger that panics. If I put vport and route traffic over veb than I can trigger panic with or without vlans as child-iface for veb. For panic i need to have pf enabled and to run ifconfig veb destroy or ifconfig vlan destroy and sh netstart in loop. this is panic without vlans panic: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth == 0" failed: file "/sys/kern/subr_xxx.c", line 163 Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 57981 52408 0 0x14000 0x2003 softnet 18952 62179 0 0x14000 0x2005 softnet db_enter() at db_enter+0x10 panic(81f36a34) at panic+0xbf __assert(81faa7fa,81fd47a7,a3,81fe7c9d) at __assert+0x25 assertwaitok() at assertwaitok+0xcc mi_switch() at mi_switch+0x40 sleep_finish(800022c707a0,1) at sleep_finish+0x10b rw_enter(822b3ad8,2) at rw_enter+0x232 pf_test(2,3,800c6048,800022c70a58) at pf_test+0xcf0 ip_output(fd80a32f1f00,0,800022c70be8,1,0,0,e8e0f1a7c10273fe) at ip_output+0x6b7 ip_forward(fd80a32f1f00,814ee000,fd83a8657078,0) at ip_forward+0x2da ip_input_if(800022c70d28,800022c70d34,4,0,814ee000) at ip_input_if+0x353 ipv4_input(814ee000,fd80a32f1f00) at ipv4_input+0x39 ether_input(814ee000,fd80a32f1f00) at ether_input+0x3ad vport_if_enqueue(814ee000,fd80a32f1f00) at vport_if_enqueue+0x19 end trace frame: 0x800022c70e70, count: 0 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{4}> ddb{4}> show reg rdi0 rsi 0x14 rbp 0x800022c705f0 rbx 0x800022424ff0 rdx 0x8000 rcx0x286 rax 0x7d r8 0x101010101010101 r9 0 r10 0x5b4ef42a9c796b43 r11 0xada7e964a691819f r12 0x800022425a60 r13 0x800022c450a0 r140 r15 0x81f36a34cmd0646_9_tim_udma+0x2d9d2 rip 0x81c01c30db_enter+0x10 cs 0x8 rflags 0x286 rsp 0x800022c705f0 ss 0x10 db_enter+0x10: popq%rbp ddb{4}> ddb{4}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 14129 480066 92457 0 3 0x3 netlock ifconfig 92457 504149 2002 0 30x10008b sigsusp sh 2002 26492 1517 0 30x10008b sigsusp sh 1517 131574 1 0 30x10008b sigsusp ksh 26252 251094 1 0 30x100098 kqreadcron 20251 457205 97875 95 3 0x1100092 kqreadsmtpd 62139 255853 97875103 3 0x1100092 kqreadsmtpd 29505 64154 97875 95 3 0x1100092 kqreadsmtpd 20035 471489 97875 95 30x100092 kqreadsmtpd 91114 73268 97875 95 3 0x1100092 kqreadsmtpd 78396 414422 97875 95 3 0x1100092 kqreadsmtpd 97875 113010 1 0 30x100080 kqreadsmtpd 21916 226987 1 0 30x88 kqreadsshd 90174247 1 0 30x100080 kqreadntpd 72358 391459 38133 83 30x100092 kqreadntpd 38133 355054 1 83 3 0x1100012 netlock ntpd 91824 285625 60194 73 3 0x1100090 kqreadsyslogd 60194 367623 1 0 30x100082 netio syslogd 73270 113983 0 0 3 0x14200 bored smr 51379 478537 0 0 3 0x14200 pgzerozerothread 85386 54454 0 0 3 0x14200 aiodoned aiodoned 10937 491268 0 0 3 0x14200 syncerupdate 85008 360847 0 0 3 0x14200 cleaner cleaner 76642 501363 0 0 3 0x14200 reaperreaper 32934 257878 0 0 3 0x14200 pgdaemon pagedaemon 48583 371156 0 0 3 0x14200 usbtskusbtask 53660 310701 0 0 3 0x14200 usbatsk usbatsk 19211 31258 0 0 3 0x40014200 acpi0 acpi0 11856 305318 0 0 3 0x40014200idle5 9933 290633 0 0 3 0x40014200idle4 41570 94891 0 0 3 0x40014200
Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface
On Wed, May 11, 2022 at 09:58:09AM +0200, Alexandr Nedvedicky wrote: > Hello, > > > > > Can we limit the number of span ports per bridge to a small number so that > > the instead of a heap object for the SLIST a simple stack array of > > MAX_SPAN_PORTS pointers could be used? > > > > Who needs more than a handfull of spanports per veb? > > I just to make sure I follow your idea... > > > > > > 8<---8<---8<--8< > > > diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c > > > index 2976cc200f1..a02dbac887f 100644 > > > > + struct veb_span_port *sp; > > > + SLIST_HEAD(, veb_span_port) span_list; > > so the span_list, will be turned to array of references, right? > > > > > > > + SLIST_INIT(_list) > > > smr_read_enter(); > > > SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) { > > > ifp0 = p->p_ifp0; > > > if (!ISSET(ifp0->if_flags, IFF_RUNNING)) > > > continue; > > > > > > - m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT); > > > - if (m == NULL) { > > > - /* XXX count error */ > > > - continue; > > > - } > > > + sp = pool_get(_port_pool, PR_NOWAIT); > > > + if (sp == NULL) > > > + continue; /* XXX count error */ > > > > > > - if_enqueue(ifp0, m); /* XXX count error */ > > > + veb_eb_brport_take(p); > > > + sp->sp_port = p; > > > + SLIST_INSERT_HEAD(_list, sp, sp_entry); > > here we do instead of SLIST_INSERT_HEAD() something like: > span_list[i++] = p; > > > > } > > > smr_read_leave(); > > > + > > > + while (!SLIST_EMPTY(_list)) { > > and here instead of walking list we just walk through array. Yes, this is my basic idea. > is that somewhat correct? If so than I need some 'qualified' suggestion > for upper limit of span ports. I'm afraid my real life experience > with network switches is zero, so I don't feel like a right person. I would suggest an upper limit of 8. For example it seems Cisco only supports 2 span sessions so 8 should be plenty. > another idea I'm toying with is to add a 'service task' to every switch > so the underlying `vport_if_enqueue()` will put packet to queue and > let task to dispatch to ifp associated with vport. Once packet > will be dispatched the task will drop the reference to port. Need to look more into this but in general I dislike adding more queues and tasks. Such a setup would just synchronize everything back to a single thread. -- :wq Claudio
Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface
Hello, > > Can we limit the number of span ports per bridge to a small number so that > the instead of a heap object for the SLIST a simple stack array of > MAX_SPAN_PORTS pointers could be used? > > Who needs more than a handfull of spanports per veb? I just to make sure I follow your idea... > > > 8<---8<---8<--8< > > diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c > > index 2976cc200f1..a02dbac887f 100644 > > + struct veb_span_port *sp; > > + SLIST_HEAD(, veb_span_port) span_list; so the span_list, will be turned to array of references, right? > > > > + SLIST_INIT(_list) > > smr_read_enter(); > > SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) { > > ifp0 = p->p_ifp0; > > if (!ISSET(ifp0->if_flags, IFF_RUNNING)) > > continue; > > > > - m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT); > > - if (m == NULL) { > > - /* XXX count error */ > > - continue; > > - } > > + sp = pool_get(_port_pool, PR_NOWAIT); > > + if (sp == NULL) > > + continue; /* XXX count error */ > > > > - if_enqueue(ifp0, m); /* XXX count error */ > > + veb_eb_brport_take(p); > > + sp->sp_port = p; > > + SLIST_INSERT_HEAD(_list, sp, sp_entry); here we do instead of SLIST_INSERT_HEAD() something like: span_list[i++] = p; > > } > > smr_read_leave(); > > + > > + while (!SLIST_EMPTY(_list)) { and here instead of walking list we just walk through array. is that somewhat correct? If so than I need some 'qualified' suggestion for upper limit of span ports. I'm afraid my real life experience with network switches is zero, so I don't feel like a right person. another idea I'm toying with is to add a 'service task' to every switch so the underlying `vport_if_enqueue()` will put packet to queue and let task to dispatch to ifp associated with vport. Once packet will be dispatched the task will drop the reference to port. I had no time to figure out how to wire things up for task. I might give it a try tonight. thanks and regards sashan
Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface
On Tue, May 10, 2022 at 12:21:02AM +0200, Alexandr Nedvedicky wrote: > Hello, > > On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote: > > Hello, > > > > I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9. > > My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable. > > Also I've used 7.1 for a limited time and there were no crash. > > After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days. > > Here crash report and dmesg: > > > > ether_input(8520e000,fd8053616700) at ether_input+0x3ad > > vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19 > > veb_port_input(851c3800,fd806064c200,,82066600) > > at veb_port_input+0x4d2 > > ether_input(851c3800,fd806064c200) at ether_input+0x100 > > end trace frame: 0x800025575290, count: 0 > > ddb{1}> show panic > > > > *cpu1: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth == > > 0" f > > ailed: file "/usr/src/sys/kern/subr_xxx.c", line 163 > > > > ddb{1}> trace > > > > db_enter() at db_enter+0x10 > > > > panic(81f22e39) at panic+0xbf > > > > __assert(81f96c9d,81f85ebc,a3,81fd252f) at > > __assert+0x2 > > > > 5 > > > > diff below attempts to fix this particular panic triggered by veb_span() > function. This is fairly simple/straightforward change: > > we grab references to veb ports inside SMR_READ_ section. > > we keep those references in single linked list > > as soon as we leave SMR_READ_ section we process the list: > dispatch packets > > drop references to port > > The change may uncover similar panics in other veb/bridge area. > > diff applies to current > > thanks for testing and reporting back. Can we limit the number of span ports per bridge to a small number so that the instead of a heap object for the SLIST a simple stack array of MAX_SPAN_PORTS pointers could be used? Who needs more than a handfull of spanports per veb? > 8<---8<---8<--8< > diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c > index 2976cc200f1..a02dbac887f 100644 > --- a/sys/net/if_veb.c > +++ b/sys/net/if_veb.c > @@ -159,6 +159,11 @@ struct veb_softc { > struct veb_ports sc_spans; > }; > > +struct veb_span_port { > + SLIST_ENTRY(veb_span_port) sp_entry; > + struct veb_port *sp_port; > +}; > + > #define DPRINTF(_sc, fmt...)do { \ > if (ISSET((_sc)->sc_if.if_flags, IFF_DEBUG)) \ > printf(fmt); \ > @@ -225,6 +230,7 @@ static struct if_clone veb_cloner = > IF_CLONE_INITIALIZER("veb", veb_clone_create, veb_clone_destroy); > > static struct pool veb_rule_pool; > +static struct pool span_port_pool; > > static int vport_clone_create(struct if_clone *, int); > static int vport_clone_destroy(struct ifnet *); > @@ -266,6 +272,11 @@ veb_clone_create(struct if_clone *ifc, int unit) > 0, IPL_SOFTNET, 0, "vebrpl", NULL); > } > > + if (span_port_pool.pr_size == 0) { > + pool_init(_port_pool, sizeof(struct veb_span_port), > + 0, IPL_SOFTNET, 0, "vebspl", NULL); > + } > + > sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL); > if (sc == NULL) > return (ENOMEM); > @@ -352,22 +363,38 @@ veb_span(struct veb_softc *sc, struct mbuf *m0) > struct veb_port *p; > struct ifnet *ifp0; > struct mbuf *m; > + struct veb_span_port *sp; > + SLIST_HEAD(, veb_span_port) span_list; > > + SLIST_INIT(_list) > smr_read_enter(); > SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) { > ifp0 = p->p_ifp0; > if (!ISSET(ifp0->if_flags, IFF_RUNNING)) > continue; > > - m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT); > - if (m == NULL) { > - /* XXX count error */ > - continue; > - } > + sp = pool_get(_port_pool, PR_NOWAIT); > + if (sp == NULL) > + continue; /* XXX count error */ > > - if_enqueue(ifp0, m); /* XXX count error */ > + veb_eb_brport_take(p); > + sp->sp_port = p; > + SLIST_INSERT_HEAD(_list, sp, sp_entry); > } > smr_read_leave(); > + > + while (!SLIST_EMPTY(_list)) { > + sp = SLIST_FIRST(_list); > + SLIST_REMOVE_HEAD(_list, sp_entry); > + > + m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT); > + if (m != NULL) > + if_enqueue(sp->sp_port->p_ifp0, m); > + /* XXX count error */ > + > + veb_eb_brport_rele(sp->sp_port); > + pool_put(_port_pool, sp); > + } > } > > static int > -- :wq Claudio
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Wed, May 11, 2022 at 12:38:56AM +0200, Alexandr Nedvedicky wrote: > Hello, > > > > > Yes. It is similar. > > > > I have read the whole mail thread and the final fix got commited. > > But it looks incomplete, pf is still sleeping. > > > > Hrvoje, can you run the tests again that triggered the panics a > > year ago? > > > > Sasha, I still think the way to go is mutex for pf locks. I don't > > see a performance impact. > > you mean performance impact against 7.1? If it is the case, > then I agree. > > > a single CPU anyway. And with sleeping locks you have to schedule > > in the hot packet path. Our schedueler was never build for that. > > I don't think it is a problem of scheduler. I rather see it as problem of > lock primitives, which can't keep consistency across a switch from CPU > (a.k.a. sleep). > > > > At genua we started with mutex, made it fine grained, and converted > > to rcu later. > > > > I think the only reason why PF_LOCK() is a writer lock is overload tables. > packet, which is about to update an overload table requires an exclusive > PF_LOCK(). As soon as there will be a way to update overload table without > an exclusive lock, then we can turn packets to PF_LOCK() readers. > > Also I somewhat disagree with idea 'network stack must never sleep'. The > rw-lock adds places to network subsystem, where scheduler has a chance to > give CPU to someone else. > > on the other hand let's stay real. if we want to keep at least part of the > network stack running in parallel these days, then pf locks need to be turn to > mutexes unless we want to hunt down all those SMR_READ sections and > fix/workaround them to deal with sleep in underlying call-stack. > > I think either way (putting mutexes in vs. hunting down sleeping SMR section) > is fine for me. My current plan is to move all those memory allocations > in ioctl(2) path outside of PF_LOCK() scope. I do not fully agree with this. SMR sections should be short and not span deep callstacks. This is not a good design and finding and fixing these errors is important. -- :wq Claudio
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
Hello, > > Yes. It is similar. > > I have read the whole mail thread and the final fix got commited. > But it looks incomplete, pf is still sleeping. > > Hrvoje, can you run the tests again that triggered the panics a > year ago? > > Sasha, I still think the way to go is mutex for pf locks. I don't > see a performance impact. you mean performance impact against 7.1? If it is the case, then I agree. > a single CPU anyway. And with sleeping locks you have to schedule > in the hot packet path. Our schedueler was never build for that. I don't think it is a problem of scheduler. I rather see it as problem of lock primitives, which can't keep consistency across a switch from CPU (a.k.a. sleep). > > At genua we started with mutex, made it fine grained, and converted > to rcu later. > I think the only reason why PF_LOCK() is a writer lock is overload tables. packet, which is about to update an overload table requires an exclusive PF_LOCK(). As soon as there will be a way to update overload table without an exclusive lock, then we can turn packets to PF_LOCK() readers. Also I somewhat disagree with idea 'network stack must never sleep'. The rw-lock adds places to network subsystem, where scheduler has a chance to give CPU to someone else. on the other hand let's stay real. if we want to keep at least part of the network stack running in parallel these days, then pf locks need to be turn to mutexes unless we want to hunt down all those SMR_READ sections and fix/workaround them to deal with sleep in underlying call-stack. I think either way (putting mutexes in vs. hunting down sleeping SMR section) is fine for me. My current plan is to move all those memory allocations in ioctl(2) path outside of PF_LOCK() scope. thanks and regards sashan
Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Tue, May 10, 2022 at 09:37:12PM +0200, Hrvoje Popovski wrote: > On 9.5.2022. 22:04, Alexander Bluhm wrote: > > Can some veb or smr hacker explain how this is supposed to work? > > > > Sleeping in pf is also not ideal as it is in the hot path and slows > > down packets. But that is not easy to fix as we have to refactor > > the memory allocations before converting pf lock to a mutex. sashan@ > > is working on that. > > > Hi, > > isn't that similar or same panic that was talked about in "parallel > forwarding vs. bridges" mail thread on tech@ started by sashan@ > > https://www.mail-archive.com/tech@openbsd.org/msg64040.html Yes. It is similar. I have read the whole mail thread and the final fix got commited. But it looks incomplete, pf is still sleeping. Hrvoje, can you run the tests again that triggered the panics a year ago? Sasha, I still think the way to go is mutex for pf locks. I don't see a performance impact. Without it, we can run network only on a single CPU anyway. And with sleeping locks you have to schedule in the hot packet path. Our schedueler was never build for that. At genua we started with mutex, made it fine grained, and converted to rcu later. bluhm
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
Hello, On Tue, May 10, 2022 at 09:37:12PM +0200, Hrvoje Popovski wrote: > On 9.5.2022. 22:04, Alexander Bluhm wrote: > > Can some veb or smr hacker explain how this is supposed to work? > > > > Sleeping in pf is also not ideal as it is in the hot path and slows > > down packets. But that is not easy to fix as we have to refactor > > the memory allocations before converting pf lock to a mutex. sashan@ > > is working on that. > rw-lock in pf is indeed a next step. The first step is to move memory allocations done by ioctl(2) out of network path. same goes to copyin()/copyout(). I'm currently looking at art tables. I theory it should be possible to replace current radix tables with art tables. If I understand things right, this would also solve locking on update. > > Hi, > > isn't that similar or same panic that was talked about in "parallel > forwarding vs. bridges" mail thread on tech@ started by sashan@ > > https://www.mail-archive.com/tech@openbsd.org/msg64040.html > this is very similar to panics you've found. However this time packets sneak to sleep path via call to veb_span(). I agree the fix looks odd. May be another way around it would be to add a task to every veb switch. The task will dispatch packets to slow path. regards sashan
Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
> On 10 May 2022, at 22:37, Hrvoje Popovski wrote: > > On 9.5.2022. 22:04, Alexander Bluhm wrote: >> Can some veb or smr hacker explain how this is supposed to work? >> >> Sleeping in pf is also not ideal as it is in the hot path and slows >> down packets. But that is not easy to fix as we have to refactor >> the memory allocations before converting pf lock to a mutex. sashan@ >> is working on that. > > > Hi, > > isn't that similar or same panic that was talked about in "parallel > forwarding vs. bridges" mail thread on tech@ started by sashan@ > > https://www.mail-archive.com/tech@openbsd.org/msg64040.html > > Yes, both panics invocated by sleep attempt within SMR section.
Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 9.5.2022. 22:04, Alexander Bluhm wrote: > Can some veb or smr hacker explain how this is supposed to work? > > Sleeping in pf is also not ideal as it is in the hot path and slows > down packets. But that is not easy to fix as we have to refactor > the memory allocations before converting pf lock to a mutex. sashan@ > is working on that. Hi, isn't that similar or same panic that was talked about in "parallel forwarding vs. bridges" mail thread on tech@ started by sashan@ https://www.mail-archive.com/tech@openbsd.org/msg64040.html
Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface
Thanks for your support. I'll try to test when you get it done. On Mon, May 9, 2022 at 8:51 PM Alexandr Nedvedicky < alexandr.nedvedi...@oracle.com> wrote: > Hello Barbaros, > > thank you for testing and excellent report. > > > > > ddb{1}> trace > > db_enter() at db_enter+0x10 > > panic(81f22e39) at panic+0xbf > > __assert(81f96c9d,81f85ebc,a3,81fd252f) at > __assert+0x25 > > assertwaitok() at assertwaitok+0xcc > > mi_switch() at mi_switch+0x40 > > assert indicates we attempt to sleep inside SMR section, > which must be avoided. > > > sleep_finish(800025574da0,1) at sleep_finish+0x10b > > rw_enter(822cfe50,1) at rw_enter+0x1cb > > pf_test(2,1,8520e000,800025575058) at pf_test+0x1088 > > ip_input_if(800025575058,800025575064,4,0,8520e000) at > ip_input_if+0xcd > > ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39 > > ether_input(8520e000,fd8053616700) at ether_input+0x3ad > > vport_if_enqueue(8520e000,fd8053616700) at > vport_if_enqueue+0x19 > > > veb_port_input(851c3800,fd806064c200,,82066600) > at veb_port_input+0x4d2 > > ether_input(851c3800,fd806064c200) at ether_input+0x100 > > vlan_input(8095a050,fd806064c200,8000255752bc) at > vlan_input+0x23d > > ether_input(8095a050,fd806064c200) at ether_input+0x85 > > if_input_process(8095a050,800025575358) at > if_input_process+0x6f > > ifiq_process(8095a460) at ifiq_process+0x69 > > taskq_thread(80035080) at taskq_thread+0x100 > > above is a call stack, which has done a bad thing (sleeping SMR > section) > > in my opinion the primary suspect is veb_port_input() which code reads as > follows: > > 966 static struct mbuf * > 967 veb_port_input(struct ifnet *ifp0, struct mbuf *m, uint64_t dst, void > *brport) > 968 { > 969 struct veb_port *p = brport; > 970 struct veb_softc *sc = p->p_veb; > 971 struct ifnet *ifp = >sc_if; > 972 struct ether_header *eh; > ... > 1021 counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes, > 1022 m->m_pkthdr.len); > 1023 > 1024 /* force packets into the one routing domain for pf */ > 1025 m->m_pkthdr.ph_rtableid = ifp->if_rdomain; > 1026 > 1027 #if NBPFILTER > 0 > 1028 if_bpf = READ_ONCE(ifp->if_bpf); > 1029 if (if_bpf != NULL) { > 1030 if (bpf_mtap_ether(if_bpf, m, 0) != 0) > 1031 goto drop; > 1032 } > 1033 #endif > 1034 > 1035 veb_span(sc, m); > 1036 > 1037 if (ISSET(p->p_bif_flags, IFBIF_BLOCKNONIP) && > 1038 veb_ip_filter(m)) > 1039 goto drop; > 1040 > 1041 if (!ISSET(ifp->if_flags, IFF_LINK0) && > 1042 veb_vlan_filter(m)) > 1043 goto drop; > 1044 > 1045 if (veb_rule_filter(p, VEB_RULE_LIST_IN, m, src, dst)) > 1046 goto drop; > > call to veb_span() at line 1035 seems to be our guy/culprit (in my > opinion): > > 356 smr_read_enter(); > 357 SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) { > 358 ifp0 = p->p_ifp0; > 359 if (!ISSET(ifp0->if_flags, IFF_RUNNING)) > 360 continue; > 361 > 362 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, > M_NOWAIT); > 363 if (m == NULL) { > 364 /* XXX count error */ > 365 continue; > 366 } > 367 > 368 if_enqueue(ifp0, m); /* XXX count error */ > 369 } > 370 smr_read_leave(); > > loop above comes from veb_span(), which calls if_enqueue() from within > a smr section. The line 368 calls here: > > 2191 static int > 2192 vport_if_enqueue(struct ifnet *ifp, struct mbuf *m) > 2193 { > 2194 /* > 2195 * switching an l2 packet toward a vport means pushing it > 2196 * into the network stack. this function exists to make > 2197 * if_vinput compat with veb calling if_enqueue. > 2198 */ > 2199 > 2200 if_vinput(ifp, m); > 2201 > 2202 return (0); > 2203 } > > which in turn calls if_vinput() which calls further down to ipstack, and IP > stack my sleep. We must change veb_span() such calls to if_vinput() will > happen > outside of SMR section. > > I don't have such complex setup to use vlans and virtual ports. I'll try to > cook some diff and pass it to you for testing. > > thanks again for coming back to us with report. > > regards > sashan > > >
Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface
Hi, I’m not a fun of this. > > + if (span_port_pool.pr_size == 0) { > + pool_init(_port_pool, sizeof(struct veb_span_port), > + 0, IPL_SOFTNET, 0, "vebspl", NULL); > + } Does initialized pool consume significant resources? Why don’t we do this within vebattach(). This is also true for `veb_rule_pool’ initialization.
Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface
Hello, On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote: > Hello, > > I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9. > My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable. > Also I've used 7.1 for a limited time and there were no crash. > After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days. > Here crash report and dmesg: > > ether_input(8520e000,fd8053616700) at ether_input+0x3ad > vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19 > veb_port_input(851c3800,fd806064c200,,82066600) > at veb_port_input+0x4d2 > ether_input(851c3800,fd806064c200) at ether_input+0x100 > end trace frame: 0x800025575290, count: 0 > ddb{1}> show panic > > *cpu1: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth == > 0" f > ailed: file "/usr/src/sys/kern/subr_xxx.c", line 163 > > ddb{1}> trace > > db_enter() at db_enter+0x10 > > panic(81f22e39) at panic+0xbf > > __assert(81f96c9d,81f85ebc,a3,81fd252f) at > __assert+0x2 > > 5 > diff below attempts to fix this particular panic triggered by veb_span() function. This is fairly simple/straightforward change: we grab references to veb ports inside SMR_READ_ section. we keep those references in single linked list as soon as we leave SMR_READ_ section we process the list: dispatch packets drop references to port The change may uncover similar panics in other veb/bridge area. diff applies to current thanks for testing and reporting back. regards sashan 8<---8<---8<--8< diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c index 2976cc200f1..a02dbac887f 100644 --- a/sys/net/if_veb.c +++ b/sys/net/if_veb.c @@ -159,6 +159,11 @@ struct veb_softc { struct veb_ports sc_spans; }; +struct veb_span_port { + SLIST_ENTRY(veb_span_port) sp_entry; + struct veb_port *sp_port; +}; + #define DPRINTF(_sc, fmt...)do { \ if (ISSET((_sc)->sc_if.if_flags, IFF_DEBUG)) \ printf(fmt); \ @@ -225,6 +230,7 @@ static struct if_clone veb_cloner = IF_CLONE_INITIALIZER("veb", veb_clone_create, veb_clone_destroy); static struct pool veb_rule_pool; +static struct pool span_port_pool; static int vport_clone_create(struct if_clone *, int); static int vport_clone_destroy(struct ifnet *); @@ -266,6 +272,11 @@ veb_clone_create(struct if_clone *ifc, int unit) 0, IPL_SOFTNET, 0, "vebrpl", NULL); } + if (span_port_pool.pr_size == 0) { + pool_init(_port_pool, sizeof(struct veb_span_port), + 0, IPL_SOFTNET, 0, "vebspl", NULL); + } + sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL); if (sc == NULL) return (ENOMEM); @@ -352,22 +363,38 @@ veb_span(struct veb_softc *sc, struct mbuf *m0) struct veb_port *p; struct ifnet *ifp0; struct mbuf *m; + struct veb_span_port *sp; + SLIST_HEAD(, veb_span_port) span_list; + SLIST_INIT(_list) smr_read_enter(); SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) { ifp0 = p->p_ifp0; if (!ISSET(ifp0->if_flags, IFF_RUNNING)) continue; - m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT); - if (m == NULL) { - /* XXX count error */ - continue; - } + sp = pool_get(_port_pool, PR_NOWAIT); + if (sp == NULL) + continue; /* XXX count error */ - if_enqueue(ifp0, m); /* XXX count error */ + veb_eb_brport_take(p); + sp->sp_port = p; + SLIST_INSERT_HEAD(_list, sp, sp_entry); } smr_read_leave(); + + while (!SLIST_EMPTY(_list)) { + sp = SLIST_FIRST(_list); + SLIST_REMOVE_HEAD(_list, sp_entry); + + m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT); + if (m != NULL) + if_enqueue(sp->sp_port->p_ifp0, m); + /* XXX count error */ + + veb_eb_brport_rele(sp->sp_port); + pool_put(_port_pool, sp); + } } static int
Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote: > I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9. > My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable. > Also I've used 7.1 for a limited time and there were no crash. > After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days. For me this looks like a bug in veb(4). > ddb{1}> trace > db_enter() at db_enter+0x10 > panic(81f22e39) at panic+0xbf > __assert(81f96c9d,81f85ebc,a3,81fd252f) at > __assert+0x25 > assertwaitok() at assertwaitok+0xcc > mi_switch() at mi_switch+0x40 > sleep_finish(800025574da0,1) at sleep_finish+0x10b > rw_enter(822cfe50,1) at rw_enter+0x1cb > pf_test(2,1,8520e000,800025575058) at pf_test+0x1088 > ip_input_if(800025575058,800025575064,4,0,8520e000) at > ip_input_if+0xcd > ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39 > ether_input(8520e000,fd8053616700) at ether_input+0x3ad > vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19 > veb_port_input(851c3800,fd806064c200,,82066600) > at veb_port_input+0x4d2 > ether_input(851c3800,fd806064c200) at ether_input+0x100 > vlan_input(8095a050,fd806064c200,8000255752bc) at > vlan_input+0x23d > ether_input(8095a050,fd806064c200) at ether_input+0x85 > if_input_process(8095a050,800025575358) at if_input_process+0x6f > ifiq_process(8095a460) at ifiq_process+0x69 > taskq_thread(80035080) at taskq_thread+0x100 veb_port_input -> veb_broadcast -> smr_read_enter; tp->p_enqueue -> vport_if_enqueue -> if_vinput -> ifp->if_input -> ether_input -> ipv4_input -> ip_input_if -> pf_test -> PF_LOCK -> rw_enter_write() After calling smr_read_enter sleeping is not allowed according to man page. pf sleeps because it uses a read write lock. I looks like we have some contention on the pf lock. With more forwarding threads, sleep in pf is more likely. > __mp_lock(823d986c) at __mp_lock+0x72 > wakeup_n(822cfe50,) at wakeup_n+0x32 > pf_test(2,2,80948050,80002557b300) at pf_test+0x11f6 > pf_route(80002557b388,fd89fb379938) at pf_route+0x1f6 > pf_test(2,1,80924050,80002557b598) at pf_test+0xa1f > ip_input_if(80002557b598,80002557b5a4,4,0,80924050) at > ip_input_if+0xcd > ipv4_input(80924050,fd8053540f00) at ipv4_input+0x39 > ether_input(80924050,fd8053540f00) at ether_input+0x3ad > if_input_process(80924050,80002557b688) at if_input_process+0x6f > ifiq_process(80926500) at ifiq_process+0x69 > taskq_thread(80035100) at taskq_thread+0x100 > __mp_lock(823d986c) at __mp_lock+0x72 > wakeup_n(822cfe50,) at wakeup_n+0x32 > pf_test(2,2,80948050,80002557b300) at pf_test+0x11f6 > pf_route(80002557b388,fd89fb379938) at pf_route+0x1f6 > pf_test(2,1,80924050,80002557b598) at pf_test+0xa1f > ip_input_if(80002557b598,80002557b5a4,4,0,80924050) at > ip_input_if+0xcd > ipv4_input(80924050,fd8053540f00) at ipv4_input+0x39 > ether_input(80924050,fd8053540f00) at ether_input+0x3ad > if_input_process(80924050,80002557b688) at if_input_process+0x6f > ifiq_process(80926500) at ifiq_process+0x69 > taskq_thread(80035100) at taskq_thread+0x100 Can some veb or smr hacker explain how this is supposed to work? Sleeping in pf is also not ideal as it is in the hot path and slows down packets. But that is not easy to fix as we have to refactor the memory allocations before converting pf lock to a mutex. sashan@ is working on that. bluhm
Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface
Hello Barbaros, thank you for testing and excellent report. > ddb{1}> trace > db_enter() at db_enter+0x10 > panic(81f22e39) at panic+0xbf > __assert(81f96c9d,81f85ebc,a3,81fd252f) at > __assert+0x25 > assertwaitok() at assertwaitok+0xcc > mi_switch() at mi_switch+0x40 assert indicates we attempt to sleep inside SMR section, which must be avoided. > sleep_finish(800025574da0,1) at sleep_finish+0x10b > rw_enter(822cfe50,1) at rw_enter+0x1cb > pf_test(2,1,8520e000,800025575058) at pf_test+0x1088 > ip_input_if(800025575058,800025575064,4,0,8520e000) at > ip_input_if+0xcd > ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39 > ether_input(8520e000,fd8053616700) at ether_input+0x3ad > vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19 > veb_port_input(851c3800,fd806064c200,,82066600) > at veb_port_input+0x4d2 > ether_input(851c3800,fd806064c200) at ether_input+0x100 > vlan_input(8095a050,fd806064c200,8000255752bc) at > vlan_input+0x23d > ether_input(8095a050,fd806064c200) at ether_input+0x85 > if_input_process(8095a050,800025575358) at if_input_process+0x6f > ifiq_process(8095a460) at ifiq_process+0x69 > taskq_thread(80035080) at taskq_thread+0x100 above is a call stack, which has done a bad thing (sleeping SMR section) in my opinion the primary suspect is veb_port_input() which code reads as follows: 966 static struct mbuf * 967 veb_port_input(struct ifnet *ifp0, struct mbuf *m, uint64_t dst, void *brport) 968 { 969 struct veb_port *p = brport; 970 struct veb_softc *sc = p->p_veb; 971 struct ifnet *ifp = >sc_if; 972 struct ether_header *eh; ... 1021 counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes, 1022 m->m_pkthdr.len); 1023 1024 /* force packets into the one routing domain for pf */ 1025 m->m_pkthdr.ph_rtableid = ifp->if_rdomain; 1026 1027 #if NBPFILTER > 0 1028 if_bpf = READ_ONCE(ifp->if_bpf); 1029 if (if_bpf != NULL) { 1030 if (bpf_mtap_ether(if_bpf, m, 0) != 0) 1031 goto drop; 1032 } 1033 #endif 1034 1035 veb_span(sc, m); 1036 1037 if (ISSET(p->p_bif_flags, IFBIF_BLOCKNONIP) && 1038 veb_ip_filter(m)) 1039 goto drop; 1040 1041 if (!ISSET(ifp->if_flags, IFF_LINK0) && 1042 veb_vlan_filter(m)) 1043 goto drop; 1044 1045 if (veb_rule_filter(p, VEB_RULE_LIST_IN, m, src, dst)) 1046 goto drop; call to veb_span() at line 1035 seems to be our guy/culprit (in my opinion): 356 smr_read_enter(); 357 SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) { 358 ifp0 = p->p_ifp0; 359 if (!ISSET(ifp0->if_flags, IFF_RUNNING)) 360 continue; 361 362 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT); 363 if (m == NULL) { 364 /* XXX count error */ 365 continue; 366 } 367 368 if_enqueue(ifp0, m); /* XXX count error */ 369 } 370 smr_read_leave(); loop above comes from veb_span(), which calls if_enqueue() from within a smr section. The line 368 calls here: 2191 static int 2192 vport_if_enqueue(struct ifnet *ifp, struct mbuf *m) 2193 { 2194 /* 2195 * switching an l2 packet toward a vport means pushing it 2196 * into the network stack. this function exists to make 2197 * if_vinput compat with veb calling if_enqueue. 2198 */ 2199 2200 if_vinput(ifp, m); 2201 2202 return (0); 2203 } which in turn calls if_vinput() which calls further down to ipstack, and IP stack my sleep. We must change veb_span() such calls to if_vinput() will happen outside of SMR section. I don't have such complex setup to use vlans and virtual ports. I'll try to cook some diff and pass it to you for testing. thanks again for coming back to us with report. regards sashan
7.1-Current crash with NET_TASKQ 4 and veb interface
Hello, I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9. My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable. Also I've used 7.1 for a limited time and there were no crash. After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days. Here crash report and dmesg: ether_input(8520e000,fd8053616700) at ether_input+0x3ad vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19 veb_port_input(851c3800,fd806064c200,,82066600) at veb_port_input+0x4d2 ether_input(851c3800,fd806064c200) at ether_input+0x100 end trace frame: 0x800025575290, count: 0 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{1}> show panic *cpu1: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth == 0" f ailed: file "/usr/src/sys/kern/subr_xxx.c", line 163 ddb{1}> trace db_enter() at db_enter+0x10 panic(81f22e39) at panic+0xbf __assert(81f96c9d,81f85ebc,a3,81fd252f) at __assert+0x2 5 assertwaitok() at assertwaitok+0xcc mi_switch() at mi_switch+0x40 sleep_finish(800025574da0,1) at sleep_finish+0x10b rw_enter(822cfe50,1) at rw_enter+0x1cb pf_test(2,1,8520e000,800025575058) at pf_test+0x1088 ip_input_if(800025575058,800025575064,4,0,8520e000) at ip_input _if+0xcd ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39 ether_input(8520e000,fd8053616700) at ether_input+0x3ad vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19 veb_port_input(851c3800,fd806064c200,,82066600) at veb_port_input+0x4d2 ether_input(851c3800,fd806064c200) at ether_input+0x100 vlan_input(8095a050,fd806064c200,8000255752bc) at vlan_input+0x 23d ether_input(8095a050,fd806064c200) at ether_input+0x85 if_input_process(8095a050,800025575358) at if_input_process+0x6f ifiq_process(8095a460) at ifiq_process+0x69 taskq_thread(80035080) at taskq_thread+0x100 end trace frame: 0x0, count: -19 ddb{1}> ps /o TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 422021 80579 0 0x2 07 ifconfig 292011 89065020x12 0x4008 mariadbd 427181 89065020x12 0x4006K mariadbd 86788 89065020x12 0x4003 mariadbd 302453 98158 0 0x14000 0x2009 softnet 88346 66890 0 0x14000 0x2005 softnet ddb{1}> machine ddbcpu 2 Stopped at x86_ipi_db+0x12:leave x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 acpicpu_idle() at acpicpu_idle+0x203 sched_idle(80001d1c3ff0) at sched_idle+0x280 end trace frame: 0x0, count: 10 ddb{2}> trace x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 acpicpu_idle() at acpicpu_idle+0x203 sched_idle(80001d1c3ff0) at sched_idle+0x280 end trace frame: 0x0, count: -5 ddb{2}> machine ddbcpu 2 Invalid cpu 2 ddb{2}> t[A[A Bad character x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 acpicpu_idle() at acpicpu_idle+0x203 sched_idle(80001d1c3ff0) at sched_idle+0x280 end trace frame: 0x0, count: -5 ddb{2}> machine ddbcpu 3 Stopped at x86_ipi_db+0x12:leave x86_ipi_db(80001d1ccff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 __mp_lock(823d986c) at __mp_lock+0x72 wakeup_n(8000fffeca88,1) at wakeup_n+0x32 futex_requeue(c13fb4a32e0,1,0,0,2) at futex_requeue+0xe4 sys_futex(8000fffc2008,8000265ca780,8000265ca7d0) at sys_futex+0xe6 syscall(8000265ca840) at syscall+0x374 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0xc13f5b4b090, count: 6 ddb{3}> trace x86_ipi_db(80001d1ccff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 __mp_lock(823d986c) at __mp_lock+0x72 wakeup_n(8000fffeca88,1) at wakeup_n+0x32 futex_requeue(c13fb4a32e0,1,0,0,2) at futex_requeue+0xe4 sys_futex(8000fffc2008,8000265ca780,8000265ca7d0) at sys_futex+0xe6 syscall(8000265ca840) at syscall+0x374 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0xc13f5b4b090, count: -9 ddb{3}> machine ddbcpu 4 Stopped at x86_ipi_db+0x12:leave x86_ipi_db(80001d1d5ff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 acpicpu_idle() at acpicpu_idle+0x203 sched_idle(80001d1d5ff0) at