subject:"7.1\-Current crash with NET_TASKQ 4 and veb interface"

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-18 Thread Alexander Bluhm

On Mon, May 16, 2022 at 05:06:28PM +0200, Claudio Jeker wrote:
> > In veb configuration we are holding the netlock and sleep in
> > smr_barrier() and refcnt_finalize().  An additional sleep in malloc()
> > is fine here.
> 
> Are you sure about this? smr_barrier() on busy systems with many cpus can
> take 100ms and more. Isn't all processing stopped during that time (IIRC
> the code uses the write exclusive netlock).

Interface ioctls are called with netlock.  veb_ioctl() calls
veb_add_port() and veb_del_port() which call smr_barrier().

I did not notice 100ms network sleep during ifconfig.  I would not
consider it an urgent problem.  If you optimize for one thing, you
pay the price somewhere else.

> Sleeping with a global rwlock held is not a good design.

And here we are back to the net lock design discussion we had with
mpi@ years ago.  More important is progress.  We should not spend
too much time on design discussion.  Better solve problems.

My plan is, in more or less this order:
- use exclusive netlock for configuration
- use shared netlock for parts that are MP safe
- kernel lock together with net lock allows quick fixes
  to make progress without making everything MP perfect
- replace kernel lock with mutex
- use something smarter than mutex to make things fast

Others have other plans, so we have different maturity of MP support
in the kernel.

bluhm

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-16 Thread Claudio Jeker

On Sat, May 14, 2022 at 12:41:00AM +0200, Alexander Bluhm wrote:
> On Fri, May 13, 2022 at 05:53:27PM +0200, Alexandr Nedvedicky wrote:
> > at this point we hold a NET_LOCK(). So basically if there won't
> > be enough memory we might start sleeping waiting for memory
> > while we will be holding a NET_LOCK.
> > 
> > This is something we should try to avoid, however this can be
> > sorted out later. At this point I just want to point out
> > this problem, which can be certainly solved in follow up
> > commit. pf(4) also has its homework to be done around
> > sleeping mallocs.
> 
> I think sleeping with netlock is not a problem in general.
> 
> In pf(4) ioctl we sleep with netlock and pflock while doing copyin()
> or copyout().  This results in a lock order reversal due to a hack
> in uvn_io().  In my opinion we should not sleep within pf lock, so
> we can convert it to mutex or someting better later.
> 
> In veb configuration we are holding the netlock and sleep in
> smr_barrier() and refcnt_finalize().  An additional sleep in malloc()
> is fine here.

Are you sure about this? smr_barrier() on busy systems with many cpus can
take 100ms and more. Isn't all processing stopped during that time (IIRC
the code uses the write exclusive netlock).

Also for smr_barrier() the ownership of the object should have already
been moved exclusivly to the thread and so it can call smr_barrier()
without holding any locks. I feel the same is true for refcnt_finalize().
 
Sleeping with a global rwlock held is not a good design.

> Holding the netlock and sleeping in m_get() is worse.  There is no
> recovery after reaching the mbuf limit.  Sleeping rules are
> inconsistent and depend on the area of the stack.  Different people
> have multiple ideas how it should be done.

-- 
:wq Claudio

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Alexander Bluhm

On Fri, May 13, 2022 at 05:53:27PM +0200, Alexandr Nedvedicky wrote:
> at this point we hold a NET_LOCK(). So basically if there won't
> be enough memory we might start sleeping waiting for memory
> while we will be holding a NET_LOCK.
> 
> This is something we should try to avoid, however this can be
> sorted out later. At this point I just want to point out
> this problem, which can be certainly solved in follow up
> commit. pf(4) also has its homework to be done around
> sleeping mallocs.

I think sleeping with netlock is not a problem in general.

In pf(4) ioctl we sleep with netlock and pflock while doing copyin()
or copyout().  This results in a lock order reversal due to a hack
in uvn_io().  In my opinion we should not sleep within pf lock, so
we can convert it to mutex or someting better later.

In veb configuration we are holding the netlock and sleep in
smr_barrier() and refcnt_finalize().  An additional sleep in malloc()
is fine here.

Holding the netlock and sleeping in m_get() is worse.  There is no
recovery after reaching the mbuf limit.  Sleeping rules are
inconsistent and depend on the area of the stack.  Different people
have multiple ideas how it should be done.

bluhm

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Alexander Bluhm

On Fri, May 13, 2022 at 12:19:46PM +1000, David Gwynne wrote:
> sorry i'm late to the party. can you try this diff?

Thanks for having a look.  I added veb(4) to my setup.  With this
diff, I cannot trigger a crash anymore.

OK bluhm@

> this diff replaces the list of ports with an array/map of ports.
> the map takes references to all the ports, so the forwarding paths
> just have to hold a reference to the map to be able to use all the
> ports. the forwarding path uses smr to get hold of a map, takes a map
> ref, and then leaves the smr crit section before iterating over the map
> and pushing packets.
> 
> this means we should only take and release a single refcnt when
> we're pushing packets out any number of ports.
> 
> if no span ports are configured, then there's no span port map and
> we don't try and take a ref, we can just return early.
> 
> we also only take and release a single refcnt when we forward the
> actual packet. forwarding to a single port provided by an etherbridge
> lookup already takes/releases the single port ref. if it falls
> through that for unknown unicast or broadcast/multicast, then it's
> a single refcnt for the current map of all ports.
> 
> Index: if_veb.c
> ===
> RCS file: /cvs/src/sys/net/if_veb.c,v
> retrieving revision 1.25
> diff -u -p -r1.25 if_veb.c
> --- if_veb.c  4 Jan 2022 06:32:39 -   1.25
> +++ if_veb.c  13 May 2022 02:01:43 -
> @@ -139,13 +139,13 @@ struct veb_port {
>   struct veb_rule_list p_vr_list[2];
>  #define VEB_RULE_LIST_OUT0
>  #define VEB_RULE_LIST_IN 1
> -
> - SMR_TAILQ_ENTRY(veb_port)p_entry;
>  };
>  
>  struct veb_ports {
> - SMR_TAILQ_HEAD(, veb_port)   l_list;
> - unsigned int l_count;
> + struct refcntm_refs;
> + unsigned int m_count;
> +
> + /* followed by an array of veb_port pointers */
>  };
>  
>  struct veb_softc {
> @@ -155,8 +155,8 @@ struct veb_softc {
>   struct etherbridge   sc_eb;
>  
>   struct rwlocksc_rule_lock;
> - struct veb_ports sc_ports;
> - struct veb_ports sc_spans;
> + struct veb_ports*sc_ports;
> + struct veb_ports*sc_spans;
>  };
>  
>  #define DPRINTF(_sc, fmt...)do { \
> @@ -184,8 +184,25 @@ static int   veb_p_ioctl(struct ifnet *, u
>  static int   veb_p_output(struct ifnet *, struct mbuf *,
>   struct sockaddr *, struct rtentry *);
>  
> -static void  veb_p_dtor(struct veb_softc *, struct veb_port *,
> - const char *);
> +static inline size_t
> +veb_ports_size(unsigned int n)
> +{
> + /* use of _ALIGN is inspired by CMSGs */
> + return _ALIGN(sizeof(struct veb_ports)) +
> + n * sizeof(struct veb_port *);
> +}
> +
> +static inline struct veb_port **
> +veb_ports_array(struct veb_ports *m)
> +{
> + return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m)));
> +}
> +
> +static void  veb_ports_free(struct veb_ports *);
> +
> +static void  veb_p_unlink(struct veb_softc *, struct veb_port *);
> +static void  veb_p_fini(struct veb_port *);
> +static void  veb_p_dtor(struct veb_softc *, struct veb_port *);
>  static int   veb_add_port(struct veb_softc *,
>   const struct ifbreq *, unsigned int);
>  static int   veb_del_port(struct veb_softc *,
> @@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i
>   return (ENOMEM);
>  
>   rw_init(>sc_rule_lock, "vebrlk");
> - SMR_TAILQ_INIT(>sc_ports.l_list);
> - SMR_TAILQ_INIT(>sc_spans.l_list);
> + sc->sc_ports = NULL;
> + sc->sc_spans = NULL;
>  
>   ifp = >sc_if;
>  
> @@ -314,7 +331,10 @@ static int
>  veb_clone_destroy(struct ifnet *ifp)
>  {
>   struct veb_softc *sc = ifp->if_softc;
> - struct veb_port *p, *np;
> + struct veb_ports *mp, *ms;
> + struct veb_port **ps;
> + struct veb_port *p;
> + unsigned int i;
>  
>   NET_LOCK();
>   sc->sc_dead = 1;
> @@ -326,10 +346,60 @@ veb_clone_destroy(struct ifnet *ifp)
>   if_detach(ifp);
>  
>   NET_LOCK();
> - SMR_TAILQ_FOREACH_SAFE_LOCKED(p, >sc_ports.l_list, p_entry, np)
> - veb_p_dtor(sc, p, "destroy");
> - SMR_TAILQ_FOREACH_SAFE_LOCKED(p, >sc_spans.l_list, p_entry, np)
> - veb_p_dtor(sc, p, "destroy");
> +
> + /*
> +  * this is an upside down version of veb_p_dtor() and
> +  * veb_ports_destroy() to avoid a lot of malloc/free and
> +  * smr_barrier calls if we remove ports one by one.
> +  */
> +
> + mp = SMR_PTR_GET_LOCKED(>sc_ports);
> + SMR_PTR_SET_LOCKED(>sc_ports, NULL);
> + if (mp != NULL) {
> + ps = veb_ports_array(mp);
> + for (i = 0; i < mp->m_count; i++) {
> + veb_p_unlink(sc, ps[i]);
> + }
> + }
> +

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Alexandr Nedvedicky

Hello Dave,

> 
> sorry i'm late to the party. can you try this diff?

glad to see you are here.

I think you diff looks good. I'm just concerned about
the memory allocation in veb_ports_insert(). The
memory is allocated with `M_WAITOK` flag, which essentially
means we may give up CPU.

the veb_ports_insert() is being called from veb_add_port()
here:


1500 }
1501 
1502 p->p_brport.eb_port_take = veb_eb_brport_take;
1503 p->p_brport.eb_port_rele = veb_eb_brport_rele;
1504 
1505 om = SMR_PTR_GET_LOCKED(ports_ptr);
1506 nm = veb_ports_insert(om, p);
1507 
1508 /* this might have changed if we slept for malloc or ifpromisc */
1509 error = ether_brport_isset(ifp0);

at this point we hold a NET_LOCK(). So basically if there won't
be enough memory we might start sleeping waiting for memory
while we will be holding a NET_LOCK.

This is something we should try to avoid, however this can be
sorted out later. At this point I just want to point out
this problem, which can be certainly solved in follow up
commit. pf(4) also has its homework to be done around
sleeping mallocs.

I think your diff should go in as is.

thanks and regards

OK sashan

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Claudio Jeker

On Fri, May 13, 2022 at 12:19:46PM +1000, David Gwynne wrote:
> On Thu, May 12, 2022 at 08:07:09PM +0200, Hrvoje Popovski wrote:
> > On 12.5.2022. 20:04, Hrvoje Popovski wrote:
> > > On 12.5.2022. 16:22, Hrvoje Popovski wrote:
> > >> On 12.5.2022. 14:48, Claudio Jeker wrote:
> > >>> I think the diff below may be enough to fix this issue. It drops the SMR
> > >>> critical secition around the enqueue operation but uses a reference on 
> > >>> the
> > >>> port insteadt to ensure that the device can't be removed during the
> > >>> enqueue. Once the enqueue is finished we enter the SMR critical section
> > >>> again and drop the reference.
> > >>>
> > >>> To make it clear that the SMR_TAILQ remains intact while a refcount is
> > >>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
> > >>> not strictly needed since the next pointer remains valid up until the
> > >>> smr_barrier() call but I find this a bit easier to understand.
> > >>> First make sure nobody else holds a reference then remove the port from
> > >>> the list.
> > >>>
> > >>> I currently have no test setup to verify this but I hope someone else 
> > >>> can
> > >>> give this a spin.
> > >> Hi,
> > >>
> > >> for now veb seems stable and i can't panic box although it should, but
> > >> please give me few more hours to torture it properly.
> > > 
> > > 
> > > I can trigger panic in veb with this diff.
> > > 
> > > Thank you ..
> > > 
> > > 
> > 
> > I can't trigger ... :))) sorry ..
> 
> sorry i'm late to the party. can you try this diff?
> 
> this diff replaces the list of ports with an array/map of ports.
> the map takes references to all the ports, so the forwarding paths
> just have to hold a reference to the map to be able to use all the
> ports. the forwarding path uses smr to get hold of a map, takes a map
> ref, and then leaves the smr crit section before iterating over the map
> and pushing packets.
> 
> this means we should only take and release a single refcnt when
> we're pushing packets out any number of ports.
> 
> if no span ports are configured, then there's no span port map and
> we don't try and take a ref, we can just return early.
> 
> we also only take and release a single refcnt when we forward the
> actual packet. forwarding to a single port provided by an etherbridge
> lookup already takes/releases the single port ref. if it falls
> through that for unknown unicast or broadcast/multicast, then it's
> a single refcnt for the current map of all ports.

This is very smart and probably better since it uses less atomic
instructions per packet. Diff reads fine. OK claudio@
 
> Index: if_veb.c
> ===
> RCS file: /cvs/src/sys/net/if_veb.c,v
> retrieving revision 1.25
> diff -u -p -r1.25 if_veb.c
> --- if_veb.c  4 Jan 2022 06:32:39 -   1.25
> +++ if_veb.c  13 May 2022 02:01:43 -
> @@ -139,13 +139,13 @@ struct veb_port {
>   struct veb_rule_list p_vr_list[2];
>  #define VEB_RULE_LIST_OUT0
>  #define VEB_RULE_LIST_IN 1
> -
> - SMR_TAILQ_ENTRY(veb_port)p_entry;
>  };
>  
>  struct veb_ports {
> - SMR_TAILQ_HEAD(, veb_port)   l_list;
> - unsigned int l_count;
> + struct refcntm_refs;
> + unsigned int m_count;
> +
> + /* followed by an array of veb_port pointers */
>  };
>  
>  struct veb_softc {
> @@ -155,8 +155,8 @@ struct veb_softc {
>   struct etherbridge   sc_eb;
>  
>   struct rwlocksc_rule_lock;
> - struct veb_ports sc_ports;
> - struct veb_ports sc_spans;
> + struct veb_ports*sc_ports;
> + struct veb_ports*sc_spans;
>  };
>  
>  #define DPRINTF(_sc, fmt...)do { \
> @@ -184,8 +184,25 @@ static int   veb_p_ioctl(struct ifnet *, u
>  static int   veb_p_output(struct ifnet *, struct mbuf *,
>   struct sockaddr *, struct rtentry *);
>  
> -static void  veb_p_dtor(struct veb_softc *, struct veb_port *,
> - const char *);
> +static inline size_t
> +veb_ports_size(unsigned int n)
> +{
> + /* use of _ALIGN is inspired by CMSGs */
> + return _ALIGN(sizeof(struct veb_ports)) +
> + n * sizeof(struct veb_port *);
> +}
> +
> +static inline struct veb_port **
> +veb_ports_array(struct veb_ports *m)
> +{
> + return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m)));
> +}
> +
> +static void  veb_ports_free(struct veb_ports *);
> +
> +static void  veb_p_unlink(struct veb_softc *, struct veb_port *);
> +static void  veb_p_fini(struct veb_port *);
> +static void  veb_p_dtor(struct veb_softc *, struct veb_port *);
>  static int   veb_add_port(struct veb_softc *,
>   const struct ifbreq *, unsigned int);
>  static int   veb_del_port(struct veb_softc *,
> @@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-13 Thread Hrvoje Popovski

On 13.5.2022. 4:19, David Gwynne wrote:
> sorry i'm late to the party. can you try this diff?
> 
> this diff replaces the list of ports with an array/map of ports.
> the map takes references to all the ports, so the forwarding paths
> just have to hold a reference to the map to be able to use all the
> ports. the forwarding path uses smr to get hold of a map, takes a map
> ref, and then leaves the smr crit section before iterating over the map
> and pushing packets.
> 
> this means we should only take and release a single refcnt when
> we're pushing packets out any number of ports.
> 
> if no span ports are configured, then there's no span port map and
> we don't try and take a ref, we can just return early.
> 
> we also only take and release a single refcnt when we forward the
> actual packet. forwarding to a single port provided by an etherbridge
> lookup already takes/releases the single port ref. if it falls
> through that for unknown unicast or broadcast/multicast, then it's
> a single refcnt for the current map of all ports.



Hi,

and with this diff i can't trigger panic ...

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread David Gwynne

On Thu, May 12, 2022 at 08:07:09PM +0200, Hrvoje Popovski wrote:
> On 12.5.2022. 20:04, Hrvoje Popovski wrote:
> > On 12.5.2022. 16:22, Hrvoje Popovski wrote:
> >> On 12.5.2022. 14:48, Claudio Jeker wrote:
> >>> I think the diff below may be enough to fix this issue. It drops the SMR
> >>> critical secition around the enqueue operation but uses a reference on the
> >>> port insteadt to ensure that the device can't be removed during the
> >>> enqueue. Once the enqueue is finished we enter the SMR critical section
> >>> again and drop the reference.
> >>>
> >>> To make it clear that the SMR_TAILQ remains intact while a refcount is
> >>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
> >>> not strictly needed since the next pointer remains valid up until the
> >>> smr_barrier() call but I find this a bit easier to understand.
> >>> First make sure nobody else holds a reference then remove the port from
> >>> the list.
> >>>
> >>> I currently have no test setup to verify this but I hope someone else can
> >>> give this a spin.
> >> Hi,
> >>
> >> for now veb seems stable and i can't panic box although it should, but
> >> please give me few more hours to torture it properly.
> > 
> > 
> > I can trigger panic in veb with this diff.
> > 
> > Thank you ..
> > 
> > 
> 
> I can't trigger ... :))) sorry ..

sorry i'm late to the party. can you try this diff?

this diff replaces the list of ports with an array/map of ports.
the map takes references to all the ports, so the forwarding paths
just have to hold a reference to the map to be able to use all the
ports. the forwarding path uses smr to get hold of a map, takes a map
ref, and then leaves the smr crit section before iterating over the map
and pushing packets.

this means we should only take and release a single refcnt when
we're pushing packets out any number of ports.

if no span ports are configured, then there's no span port map and
we don't try and take a ref, we can just return early.

we also only take and release a single refcnt when we forward the
actual packet. forwarding to a single port provided by an etherbridge
lookup already takes/releases the single port ref. if it falls
through that for unknown unicast or broadcast/multicast, then it's
a single refcnt for the current map of all ports.

Index: if_veb.c
===
RCS file: /cvs/src/sys/net/if_veb.c,v
retrieving revision 1.25
diff -u -p -r1.25 if_veb.c
--- if_veb.c4 Jan 2022 06:32:39 -   1.25
+++ if_veb.c13 May 2022 02:01:43 -
@@ -139,13 +139,13 @@ struct veb_port {
struct veb_rule_list p_vr_list[2];
 #define VEB_RULE_LIST_OUT  0
 #define VEB_RULE_LIST_IN   1
-
-   SMR_TAILQ_ENTRY(veb_port)p_entry;
 };
 
 struct veb_ports {
-   SMR_TAILQ_HEAD(, veb_port)   l_list;
-   unsigned int l_count;
+   struct refcntm_refs;
+   unsigned int m_count;
+
+   /* followed by an array of veb_port pointers */
 };
 
 struct veb_softc {
@@ -155,8 +155,8 @@ struct veb_softc {
struct etherbridge   sc_eb;
 
struct rwlocksc_rule_lock;
-   struct veb_ports sc_ports;
-   struct veb_ports sc_spans;
+   struct veb_ports*sc_ports;
+   struct veb_ports*sc_spans;
 };
 
 #define DPRINTF(_sc, fmt...)do { \
@@ -184,8 +184,25 @@ static int veb_p_ioctl(struct ifnet *, u
 static int veb_p_output(struct ifnet *, struct mbuf *,
struct sockaddr *, struct rtentry *);
 
-static voidveb_p_dtor(struct veb_softc *, struct veb_port *,
-   const char *);
+static inline size_t
+veb_ports_size(unsigned int n)
+{
+   /* use of _ALIGN is inspired by CMSGs */
+   return _ALIGN(sizeof(struct veb_ports)) +
+   n * sizeof(struct veb_port *);
+}
+
+static inline struct veb_port **
+veb_ports_array(struct veb_ports *m)
+{
+   return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m)));
+}
+
+static voidveb_ports_free(struct veb_ports *);
+
+static voidveb_p_unlink(struct veb_softc *, struct veb_port *);
+static voidveb_p_fini(struct veb_port *);
+static voidveb_p_dtor(struct veb_softc *, struct veb_port *);
 static int veb_add_port(struct veb_softc *,
const struct ifbreq *, unsigned int);
 static int veb_del_port(struct veb_softc *,
@@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i
return (ENOMEM);
 
rw_init(>sc_rule_lock, "vebrlk");
-   SMR_TAILQ_INIT(>sc_ports.l_list);
-   SMR_TAILQ_INIT(>sc_spans.l_list);
+   sc->sc_ports = NULL;
+   sc->sc_spans = NULL;
 
ifp = >sc_if;
 
@@ -314,7 +331,10 @@ static int
 veb_clone_destroy(struct ifnet *ifp)
 {
struct veb_softc *sc = ifp->if_softc;
-   struct

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Hrvoje Popovski

On 12.5.2022. 20:04, Hrvoje Popovski wrote:
> On 12.5.2022. 16:22, Hrvoje Popovski wrote:
>> On 12.5.2022. 14:48, Claudio Jeker wrote:
>>> I think the diff below may be enough to fix this issue. It drops the SMR
>>> critical secition around the enqueue operation but uses a reference on the
>>> port insteadt to ensure that the device can't be removed during the
>>> enqueue. Once the enqueue is finished we enter the SMR critical section
>>> again and drop the reference.
>>>
>>> To make it clear that the SMR_TAILQ remains intact while a refcount is
>>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
>>> not strictly needed since the next pointer remains valid up until the
>>> smr_barrier() call but I find this a bit easier to understand.
>>> First make sure nobody else holds a reference then remove the port from
>>> the list.
>>>
>>> I currently have no test setup to verify this but I hope someone else can
>>> give this a spin.
>> Hi,
>>
>> for now veb seems stable and i can't panic box although it should, but
>> please give me few more hours to torture it properly.
> 
> 
> I can trigger panic in veb with this diff.
> 
> Thank you ..
> 
> 

I can't trigger ... :))) sorry ..

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Hrvoje Popovski

On 12.5.2022. 16:22, Hrvoje Popovski wrote:
> On 12.5.2022. 14:48, Claudio Jeker wrote:
>> I think the diff below may be enough to fix this issue. It drops the SMR
>> critical secition around the enqueue operation but uses a reference on the
>> port insteadt to ensure that the device can't be removed during the
>> enqueue. Once the enqueue is finished we enter the SMR critical section
>> again and drop the reference.
>>
>> To make it clear that the SMR_TAILQ remains intact while a refcount is
>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
>> not strictly needed since the next pointer remains valid up until the
>> smr_barrier() call but I find this a bit easier to understand.
>> First make sure nobody else holds a reference then remove the port from
>> the list.
>>
>> I currently have no test setup to verify this but I hope someone else can
>> give this a spin.
> Hi,
> 
> for now veb seems stable and i can't panic box although it should, but
> please give me few more hours to torture it properly.


I can trigger panic in veb with this diff.

Thank you ..

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Alexandr Nedvedicky

Hello,

> 
> I think the diff below may be enough to fix this issue. It drops the SMR
> critical secition around the enqueue operation but uses a reference on the
> port insteadt to ensure that the device can't be removed during the
> enqueue. Once the enqueue is finished we enter the SMR critical section
> again and drop the reference.
> 
> To make it clear that the SMR_TAILQ remains intact while a refcount is
> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
> not strictly needed since the next pointer remains valid up until the
> smr_barrier() call but I find this a bit easier to understand.
> First make sure nobody else holds a reference then remove the port from
> the list.
> 
> I currently have no test setup to verify this but I hope someone else can
> give this a spin.

I agree this diff is far better than mine version. I have not
realized we can actually do a dance with
smr_read_enter()/smr_read_leave(). I believe it should work and
fix a problem. Here is my OK (if my OK counts here).

OK sashan

> -- 
> :wq Claudio
> 
> Index: if_veb.c
> ===
> RCS file: /cvs/src/sys/net/if_veb.c,v
> retrieving revision 1.25
> diff -u -p -r1.25 if_veb.c
> --- if_veb.c  4 Jan 2022 06:32:39 -   1.25
> +++ if_veb.c  12 May 2022 12:06:28 -
> @@ -365,7 +365,11 @@ veb_span(struct veb_softc *sc, struct mb
>   continue;
>   }
>  
> + veb_eb_brport_take(p);
> + smr_read_leave();
>   if_enqueue(ifp0, m); /* XXX count error */
> + smr_read_enter();
> + veb_eb_brport_rele(p);
>   }
>   smr_read_leave();
>  }
> @@ -904,7 +908,12 @@ veb_broadcast(struct veb_softc *sc, stru
>   continue;
>   }
>  
> +
> + veb_eb_brport_take(tp);
> + smr_read_leave();
>   (*tp->p_enqueue)(ifp0, m); /* XXX count error */
> + smr_read_enter();
> + veb_eb_brport_rele(tp);
>   }
>   smr_read_leave();
>  
> @@ -1934,10 +1943,11 @@ veb_p_dtor(struct veb_softc *sc, struct 
>  
>   port_list = >sc_ports;
>   }
> + refcnt_finalize(>p_refs, "vebpdtor");
> +
>   SMR_TAILQ_REMOVE_LOCKED(_list->l_list, p, p_entry);
>   port_list->l_count--;
>  
> - refcnt_finalize(>p_refs, "vebpdtor");
>   smr_barrier();
>  
>   veb_rule_list_free(TAILQ_FIRST(>p_vrl));

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Hrvoje Popovski

On 12.5.2022. 14:48, Claudio Jeker wrote:
> I think the diff below may be enough to fix this issue. It drops the SMR
> critical secition around the enqueue operation but uses a reference on the
> port insteadt to ensure that the device can't be removed during the
> enqueue. Once the enqueue is finished we enter the SMR critical section
> again and drop the reference.
> 
> To make it clear that the SMR_TAILQ remains intact while a refcount is
> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
> not strictly needed since the next pointer remains valid up until the
> smr_barrier() call but I find this a bit easier to understand.
> First make sure nobody else holds a reference then remove the port from
> the list.
> 
> I currently have no test setup to verify this but I hope someone else can
> give this a spin.

Hi,

for now veb seems stable and i can't panic box although it should, but
please give me few more hours to torture it properly.

I'm doing this in loop
ifconfig veb1 destroy
sh /etc/netstart
ifconfig veb0 destroy
sh /etc/netstart
ifconfig vport1 destroy
sh /etc/netstart
ifconfig vport0 destroy
sh /etc/netstart



my config

veb1: flags=a843
index 25 llprio 3
groups: veb
ix1 flags=3
port 2 ifpriority 0 ifcost 0
vport1 flags=3
port 27 ifpriority 0 ifcost 0
veb0: flags=a843
index 26 llprio 3
groups: veb
ix0 flags=3
port 1 ifpriority 0 ifcost 0
vport0 flags=3
port 28 ifpriority 0 ifcost 0
ix2 flags=100
vport1: flags=8943 mtu 1500
lladdr ec:f4:bb:da:f7:fa
index 27 priority 0 llprio 3
groups: vport
inet 192.168.111.11 netmask 0xff00 broadcast 192.168.111.255
vport0: flags=8943 mtu 1500
lladdr ec:f4:bb:da:f7:f8
index 28 priority 0 llprio 3
groups: vport
inet 192.168.100.11 netmask 0xff00 broadcast 192.168.100.255

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Claudio Jeker

On Wed, May 11, 2022 at 11:01:21AM +0200, Alexandr Nedvedicky wrote:
> Hello Hrvoje,
> 
> thank you for testing.
> On Wed, May 11, 2022 at 10:40:28AM +0200, Hrvoje Popovski wrote:
> > On 10.5.2022. 22:55, Alexander Bluhm wrote:
> > > Yes.  It is similar.
> > > 
> > > I have read the whole mail thread and the final fix got commited.
> > > But it looks incomplete, pf is still sleeping.
> > > 
> > > Hrvoje, can you run the tests again that triggered the panics a
> > > year ago?
> > 
> > Hi,
> > 
> > year ago panics was triggered when veb or tpmr bridged traffic. I've
> > tried that right now and I couldn't trigger that panics.
> > If I put vport and route traffic over veb than I can trigger panic with
> > or without vlans as child-iface for veb.
> > For panic i need to have pf enabled and to run
> > ifconfig veb destroy or ifconfig vlan destroy and sh netstart in loop.
> > 
> > 
> > this is panic without vlans
> > 
> > ddb{4}> show all locks
> > CPU 4:
> > exclusive sched_lock _lock r = 0 (0x824819c0)
> > #0  witness_lock+0x311
> > #1  sleep_setup+0xa5
> > #2  rw_enter+0x211
> > #3  pf_test+0xcf0
> > #4  ip_output+0x6b7
> > #5  ip_forward+0x2da
> > #6  ip_input_if+0x353
> > #7  ipv4_input+0x39
> > #8  ether_input+0x3ad
> > #9  vport_if_enqueue+0x19
> > #10 veb_port_input+0x529
> 
> I suspect we deal with broadcast and this time things start
> to go downhill here in veb_broadcast():
> 
>  875 smr_read_enter();
>  876 SMR_TAILQ_FOREACH(tp, >sc_ports.l_list, p_entry) {
>  877 if (rp == tp || (rp->p_protected & tp->p_protected)) {
>  878 /*
>  879  * don't let Ethernet packets hairpin or
>  880  * move between ports in the same protected
>  881  * domain(s).
>  882  */
>  883 continue;
>  884 }
>  885 
>  886 ifp0 = tp->p_ifp0;
>  887 if (!ISSET(ifp0->if_flags, IFF_RUNNING)) {
>  888 /* don't waste time */
>  889 continue;
>  890 }
>  891 
>  892 if (!ISSET(tp->p_bif_flags, IFBIF_DISCOVER) &&
>  893 !ISSET(m0->m_flags, M_BCAST | M_MCAST)) {
>  894 /* don't flood unknown unicast */
>  895 continue;
>  896 }
>  897 
>  898 if (veb_rule_filter(tp, VEB_RULE_LIST_OUT, m0, src, dst))
>  899 continue;
>  900 
>  901 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
>  902 if (m == NULL) {
>  903 /* XXX count error? */
>  904 continue;
>  905 }
>  906 
>  907 (*tp->p_enqueue)(ifp0, m); /* XXX count error */
>  908 }
>  909 smr_read_leave();
> 
> the veb_broadcast() is being called from veb_port_input():
> 1082 
> 1083 /* unknown unicast address */
> 1084 } else {
> 1085 SET(m->m_flags, ETH64_IS_BROADCAST(dst) ? M_BCAST : 
> M_MCAST);
> 1086 }
> 1087 
> 1088 veb_broadcast(sc, p, m, src, dst);
> 1089 return (NULL);
> 
> I think I've spotted the place yesterday night in code, but could not
> figure out how to wire things up to trigger the panic. Fortunately you've
> managed to figure it out.
> 
> I think vport_if_enqueue() is a trap/land mine here. The function should 
> really
> be queueing a packet instead of dispatching it right away.
> 

I think the diff below may be enough to fix this issue. It drops the SMR
critical secition around the enqueue operation but uses a reference on the
port insteadt to ensure that the device can't be removed during the
enqueue. Once the enqueue is finished we enter the SMR critical section
again and drop the reference.

To make it clear that the SMR_TAILQ remains intact while a refcount is
held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
not strictly needed since the next pointer remains valid up until the
smr_barrier() call but I find this a bit easier to understand.
First make sure nobody else holds a reference then remove the port from
the list.

I currently have no test setup to verify this but I hope someone else can
give this a spin.
-- 
:wq Claudio

Index: if_veb.c
===
RCS file: /cvs/src/sys/net/if_veb.c,v
retrieving revision 1.25
diff -u -p -r1.25 if_veb.c
--- if_veb.c4 Jan 2022 06:32:39 -   1.25
+++ if_veb.c12 May 2022 12:06:28 -
@@ -365,7 +365,11 @@ veb_span(struct veb_softc *sc, struct mb
continue;
}
 
+   veb_eb_brport_take(p);
+   smr_read_leave();
if_enqueue(ifp0, m); /* XXX count error */
+   smr_read_enter();
+

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Alexandr Nedvedicky

Hello Hrvoje,

thank you for testing.
On Wed, May 11, 2022 at 10:40:28AM +0200, Hrvoje Popovski wrote:
> On 10.5.2022. 22:55, Alexander Bluhm wrote:
> > Yes.  It is similar.
> > 
> > I have read the whole mail thread and the final fix got commited.
> > But it looks incomplete, pf is still sleeping.
> > 
> > Hrvoje, can you run the tests again that triggered the panics a
> > year ago?
> 
> Hi,
> 
> year ago panics was triggered when veb or tpmr bridged traffic. I've
> tried that right now and I couldn't trigger that panics.
> If I put vport and route traffic over veb than I can trigger panic with
> or without vlans as child-iface for veb.
> For panic i need to have pf enabled and to run
> ifconfig veb destroy or ifconfig vlan destroy and sh netstart in loop.
> 
> 
> this is panic without vlans
> 
> ddb{4}> show all locks
> CPU 4:
> exclusive sched_lock _lock r = 0 (0x824819c0)
> #0  witness_lock+0x311
> #1  sleep_setup+0xa5
> #2  rw_enter+0x211
> #3  pf_test+0xcf0
> #4  ip_output+0x6b7
> #5  ip_forward+0x2da
> #6  ip_input_if+0x353
> #7  ipv4_input+0x39
> #8  ether_input+0x3ad
> #9  vport_if_enqueue+0x19
> #10 veb_port_input+0x529

I suspect we deal with broadcast and this time things start
to go downhill here in veb_broadcast():

 875 smr_read_enter();
 876 SMR_TAILQ_FOREACH(tp, >sc_ports.l_list, p_entry) {
 877 if (rp == tp || (rp->p_protected & tp->p_protected)) {
 878 /*
 879  * don't let Ethernet packets hairpin or
 880  * move between ports in the same protected
 881  * domain(s).
 882  */
 883 continue;
 884 }
 885 
 886 ifp0 = tp->p_ifp0;
 887 if (!ISSET(ifp0->if_flags, IFF_RUNNING)) {
 888 /* don't waste time */
 889 continue;
 890 }
 891 
 892 if (!ISSET(tp->p_bif_flags, IFBIF_DISCOVER) &&
 893 !ISSET(m0->m_flags, M_BCAST | M_MCAST)) {
 894 /* don't flood unknown unicast */
 895 continue;
 896 }
 897 
 898 if (veb_rule_filter(tp, VEB_RULE_LIST_OUT, m0, src, dst))
 899 continue;
 900 
 901 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
 902 if (m == NULL) {
 903 /* XXX count error? */
 904 continue;
 905 }
 906 
 907 (*tp->p_enqueue)(ifp0, m); /* XXX count error */
 908 }
 909 smr_read_leave();

the veb_broadcast() is being called from veb_port_input():
1082 
1083 /* unknown unicast address */
1084 } else {
1085 SET(m->m_flags, ETH64_IS_BROADCAST(dst) ? M_BCAST : 
M_MCAST);
1086 }
1087 
1088 veb_broadcast(sc, p, m, src, dst);
1089 return (NULL);

I think I've spotted the place yesterday night in code, but could not
figure out how to wire things up to trigger the panic. Fortunately you've
managed to figure it out.

I think vport_if_enqueue() is a trap/land mine here. The function should really
be queueing a packet instead of dispatching it right away.


thanks and
regards
sashan

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Claudio Jeker

On Wed, May 11, 2022 at 10:29:57AM +0200, Claudio Jeker wrote:
> On Wed, May 11, 2022 at 09:58:09AM +0200, Alexandr Nedvedicky wrote:
> > Hello,
> > 
> > > 
> > > Can we limit the number of span ports per bridge to a small number so that
> > > the instead of a heap object for the SLIST a simple stack array of
> > > MAX_SPAN_PORTS pointers could be used?
> > > 
> > > Who needs more than a handfull of spanports per veb?
> > 
> > I just to make sure I follow your idea...
> > 
> > > 
> > > > 8<---8<---8<--8<
> > > > diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c
> > > > index 2976cc200f1..a02dbac887f 100644
> > 
> > > > +   struct veb_span_port *sp;
> > > > +   SLIST_HEAD(, veb_span_port) span_list;
> > 
> > so the span_list, will be turned to array of references, right?
> > 
> > > >  
> > > > +   SLIST_INIT(_list)
> > > > smr_read_enter();
> > > > SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) {
> > > > ifp0 = p->p_ifp0;
> > > > if (!ISSET(ifp0->if_flags, IFF_RUNNING))
> > > > continue;
> > > >  
> > > > -   m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
> > > > -   if (m == NULL) {
> > > > -   /* XXX count error */
> > > > -   continue;
> > > > -   }
> > > > +   sp = pool_get(_port_pool, PR_NOWAIT);
> > > > +   if (sp == NULL)
> > > > +   continue;   /* XXX count error */
> > > >  
> > > > -   if_enqueue(ifp0, m); /* XXX count error */
> > > > +   veb_eb_brport_take(p);
> > > > +   sp->sp_port = p;
> > > > +   SLIST_INSERT_HEAD(_list, sp, sp_entry);
> > 
> > here we do instead of SLIST_INSERT_HEAD() something like:
> > span_list[i++] = p;
> > 
> > > > }
> > > > smr_read_leave();
> > > > +
> > > > +   while (!SLIST_EMPTY(_list)) {
> > 
> > and here instead of walking list we just walk through array.
> 
> Yes, this is my basic idea.
>  
> > is that somewhat correct? If so than I need some 'qualified' suggestion
> > for upper limit of span ports. I'm afraid my real life experience
> > with network switches is zero, so I don't feel like a right person.
> 
> I would suggest an upper limit of 8. For example it seems Cisco only
> supports 2 span sessions so 8 should be plenty.

Now looking at the original panic again I realized that the problem is
probably not veb_span() but actually veb_broadcast().
Both functions suffer from a similar issue that the smr block covers too
much.
For veb_broadcast the trick with the limit will not work. Using a service
task just for broadcast is also not a good option since it reorders packets.
 
> > another idea I'm toying with is to add a 'service task' to every switch
> > so the underlying `vport_if_enqueue()` will put packet to queue and
> > let task to dispatch to ifp associated with vport. Once packet
> > will be dispatched the task will drop the reference to port.
> 
> Need to look more into this but in general I dislike adding more queues
> and tasks. Such a setup would just synchronize everything back to a single
> thread.

-- 
:wq Claudio

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Hrvoje Popovski

On 10.5.2022. 22:55, Alexander Bluhm wrote:
> Yes.  It is similar.
> 
> I have read the whole mail thread and the final fix got commited.
> But it looks incomplete, pf is still sleeping.
> 
> Hrvoje, can you run the tests again that triggered the panics a
> year ago?

Hi,

year ago panics was triggered when veb or tpmr bridged traffic. I've
tried that right now and I couldn't trigger that panics.
If I put vport and route traffic over veb than I can trigger panic with
or without vlans as child-iface for veb.
For panic i need to have pf enabled and to run
ifconfig veb destroy or ifconfig vlan destroy and sh netstart in loop.

this is panic without vlans

panic: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth
== 0" failed: file "/sys/kern/subr_xxx.c", line 163
Stopped at  db_enter+0x10:  popq%rbp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
  57981  52408  0 0x14000  0x2003  softnet
  18952  62179  0 0x14000  0x2005  softnet
db_enter() at db_enter+0x10
panic(81f36a34) at panic+0xbf
__assert(81faa7fa,81fd47a7,a3,81fe7c9d) at
__assert+0x25
assertwaitok() at assertwaitok+0xcc
mi_switch() at mi_switch+0x40
sleep_finish(800022c707a0,1) at sleep_finish+0x10b
rw_enter(822b3ad8,2) at rw_enter+0x232
pf_test(2,3,800c6048,800022c70a58) at pf_test+0xcf0
ip_output(fd80a32f1f00,0,800022c70be8,1,0,0,e8e0f1a7c10273fe) at
ip_output+0x6b7
ip_forward(fd80a32f1f00,814ee000,fd83a8657078,0) at
ip_forward+0x2da
ip_input_if(800022c70d28,800022c70d34,4,0,814ee000) at
ip_input_if+0x353
ipv4_input(814ee000,fd80a32f1f00) at ipv4_input+0x39
ether_input(814ee000,fd80a32f1f00) at ether_input+0x3ad
vport_if_enqueue(814ee000,fd80a32f1f00) at vport_if_enqueue+0x19
end trace frame: 0x800022c70e70, count: 0
https://www.openbsd.org/ddb.html describes the minimum info required in
bug reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{4}>

ddb{4}> show reg
rdi0
rsi 0x14
rbp   0x800022c705f0
rbx   0x800022424ff0
rdx   0x8000
rcx0x286
rax 0x7d
r8 0x101010101010101
r9 0
r10   0x5b4ef42a9c796b43
r11   0xada7e964a691819f
r12   0x800022425a60
r13   0x800022c450a0
r140
r15   0x81f36a34cmd0646_9_tim_udma+0x2d9d2
rip   0x81c01c30db_enter+0x10
cs   0x8
rflags 0x286
rsp   0x800022c705f0
ss  0x10
db_enter+0x10:  popq%rbp
ddb{4}>

ddb{4}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 14129  480066  92457  0  3 0x3  netlock   ifconfig
 92457  504149   2002  0  30x10008b  sigsusp   sh
  2002   26492   1517  0  30x10008b  sigsusp   sh
  1517  131574  1  0  30x10008b  sigsusp   ksh
 26252  251094  1  0  30x100098  kqreadcron
 20251  457205  97875 95  3   0x1100092  kqreadsmtpd
 62139  255853  97875103  3   0x1100092  kqreadsmtpd
 29505   64154  97875 95  3   0x1100092  kqreadsmtpd
 20035  471489  97875 95  30x100092  kqreadsmtpd
 91114   73268  97875 95  3   0x1100092  kqreadsmtpd
 78396  414422  97875 95  3   0x1100092  kqreadsmtpd
 97875  113010  1  0  30x100080  kqreadsmtpd
 21916  226987  1  0  30x88  kqreadsshd
  90174247  1  0  30x100080  kqreadntpd
 72358  391459  38133 83  30x100092  kqreadntpd
 38133  355054  1 83  3   0x1100012  netlock   ntpd
 91824  285625  60194 73  3   0x1100090  kqreadsyslogd
 60194  367623  1  0  30x100082  netio syslogd
 73270  113983  0  0  3 0x14200  bored smr
 51379  478537  0  0  3 0x14200  pgzerozerothread
 85386   54454  0  0  3 0x14200  aiodoned  aiodoned
 10937  491268  0  0  3 0x14200  syncerupdate
 85008  360847  0  0  3 0x14200  cleaner   cleaner
 76642  501363  0  0  3 0x14200  reaperreaper
 32934  257878  0  0  3 0x14200  pgdaemon  pagedaemon
 48583  371156  0  0  3 0x14200  usbtskusbtask
 53660  310701  0  0  3 0x14200  usbatsk   usbatsk
 19211   31258  0  0  3  0x40014200  acpi0 acpi0
 11856  305318  0  0  3  0x40014200idle5
  9933  290633  0  0  3  0x40014200idle4
 41570   94891  0  0  3  0x40014200

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Claudio Jeker

On Wed, May 11, 2022 at 09:58:09AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> > 
> > Can we limit the number of span ports per bridge to a small number so that
> > the instead of a heap object for the SLIST a simple stack array of
> > MAX_SPAN_PORTS pointers could be used?
> > 
> > Who needs more than a handfull of spanports per veb?
> 
> I just to make sure I follow your idea...
> 
> > 
> > > 8<---8<---8<--8<
> > > diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c
> > > index 2976cc200f1..a02dbac887f 100644
> 
> > > + struct veb_span_port *sp;
> > > + SLIST_HEAD(, veb_span_port) span_list;
> 
> so the span_list, will be turned to array of references, right?
> 
> > >  
> > > + SLIST_INIT(_list)
> > >   smr_read_enter();
> > >   SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) {
> > >   ifp0 = p->p_ifp0;
> > >   if (!ISSET(ifp0->if_flags, IFF_RUNNING))
> > >   continue;
> > >  
> > > - m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
> > > - if (m == NULL) {
> > > - /* XXX count error */
> > > - continue;
> > > - }
> > > + sp = pool_get(_port_pool, PR_NOWAIT);
> > > + if (sp == NULL)
> > > + continue;   /* XXX count error */
> > >  
> > > - if_enqueue(ifp0, m); /* XXX count error */
> > > + veb_eb_brport_take(p);
> > > + sp->sp_port = p;
> > > + SLIST_INSERT_HEAD(_list, sp, sp_entry);
> 
> here we do instead of SLIST_INSERT_HEAD() something like:
>   span_list[i++] = p;
> 
> > >   }
> > >   smr_read_leave();
> > > +
> > > + while (!SLIST_EMPTY(_list)) {
> 
> and here instead of walking list we just walk through array.

Yes, this is my basic idea.
 
> is that somewhat correct? If so than I need some 'qualified' suggestion
> for upper limit of span ports. I'm afraid my real life experience
> with network switches is zero, so I don't feel like a right person.

I would suggest an upper limit of 8. For example it seems Cisco only
supports 2 span sessions so 8 should be plenty.
 
> another idea I'm toying with is to add a 'service task' to every switch
> so the underlying `vport_if_enqueue()` will put packet to queue and
> let task to dispatch to ifp associated with vport. Once packet
> will be dispatched the task will drop the reference to port.

Need to look more into this but in general I dislike adding more queues
and tasks. Such a setup would just synchronize everything back to a single
thread.

-- 
:wq Claudio

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Alexandr Nedvedicky

Hello,

> 
> Can we limit the number of span ports per bridge to a small number so that
> the instead of a heap object for the SLIST a simple stack array of
> MAX_SPAN_PORTS pointers could be used?
> 
> Who needs more than a handfull of spanports per veb?

I just to make sure I follow your idea...

> 
> > 8<---8<---8<--8<
> > diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c
> > index 2976cc200f1..a02dbac887f 100644

> > +   struct veb_span_port *sp;
> > +   SLIST_HEAD(, veb_span_port) span_list;

so the span_list, will be turned to array of references, right?

> >  
> > +   SLIST_INIT(_list)
> > smr_read_enter();
> > SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) {
> > ifp0 = p->p_ifp0;
> > if (!ISSET(ifp0->if_flags, IFF_RUNNING))
> > continue;
> >  
> > -   m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
> > -   if (m == NULL) {
> > -   /* XXX count error */
> > -   continue;
> > -   }
> > +   sp = pool_get(_port_pool, PR_NOWAIT);
> > +   if (sp == NULL)
> > +   continue;   /* XXX count error */
> >  
> > -   if_enqueue(ifp0, m); /* XXX count error */
> > +   veb_eb_brport_take(p);
> > +   sp->sp_port = p;
> > +   SLIST_INSERT_HEAD(_list, sp, sp_entry);

here we do instead of SLIST_INSERT_HEAD() something like:
span_list[i++] = p;

> > }
> > smr_read_leave();
> > +
> > +   while (!SLIST_EMPTY(_list)) {

and here instead of walking list we just walk through array.

is that somewhat correct? If so than I need some 'qualified' suggestion
for upper limit of span ports. I'm afraid my real life experience
with network switches is zero, so I don't feel like a right person.

another idea I'm toying with is to add a 'service task' to every switch
so the underlying `vport_if_enqueue()` will put packet to queue and
let task to dispatch to ifp associated with vport. Once packet
will be dispatched the task will drop the reference to port.

I had no time to figure out how to wire things up for task.
I might give it a try tonight.

thanks and
regards
sashan

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Claudio Jeker

On Tue, May 10, 2022 at 12:21:02AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote:
> > Hello,
> > 
> > I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9.
> > My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable.
> > Also I've used 7.1 for a limited time and there were no crash.
> > After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days.
> > Here crash report and dmesg:
> > 
> > ether_input(8520e000,fd8053616700) at ether_input+0x3ad
> > vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19
> > veb_port_input(851c3800,fd806064c200,,82066600)
> >  at veb_port_input+0x4d2
> > ether_input(851c3800,fd806064c200) at ether_input+0x100
> > end trace frame: 0x800025575290, count: 0
> > ddb{1}> show panic
> > 
> > *cpu1: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth ==
> > 0" f
> > ailed: file "/usr/src/sys/kern/subr_xxx.c", line 163
> > 
> > ddb{1}> trace
> > 
> > db_enter() at db_enter+0x10
> > 
> > panic(81f22e39) at panic+0xbf
> > 
> > __assert(81f96c9d,81f85ebc,a3,81fd252f) at
> > __assert+0x2
> > 
> > 5
> > 
> 
> diff below attempts to fix this particular panic triggered by veb_span()
> function. This is fairly simple/straightforward change:
> 
>   we grab references to veb ports inside SMR_READ_ section.
> 
>   we keep those references in single linked list
> 
>   as soon as we leave SMR_READ_ section we process the list:
>   dispatch packets
> 
>   drop references to port
> 
> The change may uncover similar panics in other veb/bridge area.
> 
> diff applies to current
> 
> thanks for testing and reporting back.

Can we limit the number of span ports per bridge to a small number so that
the instead of a heap object for the SLIST a simple stack array of
MAX_SPAN_PORTS pointers could be used?

Who needs more than a handfull of spanports per veb?

> 8<---8<---8<--8<
> diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c
> index 2976cc200f1..a02dbac887f 100644
> --- a/sys/net/if_veb.c
> +++ b/sys/net/if_veb.c
> @@ -159,6 +159,11 @@ struct veb_softc {
>   struct veb_ports sc_spans;
>  };
>  
> +struct veb_span_port {
> + SLIST_ENTRY(veb_span_port)   sp_entry;
> + struct veb_port *sp_port;
> +};
> +
>  #define DPRINTF(_sc, fmt...)do { \
>   if (ISSET((_sc)->sc_if.if_flags, IFF_DEBUG)) \
>   printf(fmt); \
> @@ -225,6 +230,7 @@ static struct if_clone veb_cloner =
>  IF_CLONE_INITIALIZER("veb", veb_clone_create, veb_clone_destroy);
>  
>  static struct pool veb_rule_pool;
> +static struct pool span_port_pool;
>  
>  static int   vport_clone_create(struct if_clone *, int);
>  static int   vport_clone_destroy(struct ifnet *);
> @@ -266,6 +272,11 @@ veb_clone_create(struct if_clone *ifc, int unit)
>   0, IPL_SOFTNET, 0, "vebrpl", NULL);
>   }
>  
> + if (span_port_pool.pr_size == 0) {
> + pool_init(_port_pool, sizeof(struct veb_span_port),
> + 0, IPL_SOFTNET, 0, "vebspl", NULL);
> + }
> +
>   sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL);
>   if (sc == NULL)
>   return (ENOMEM);
> @@ -352,22 +363,38 @@ veb_span(struct veb_softc *sc, struct mbuf *m0)
>   struct veb_port *p;
>   struct ifnet *ifp0;
>   struct mbuf *m;
> + struct veb_span_port *sp;
> + SLIST_HEAD(, veb_span_port) span_list;
>  
> + SLIST_INIT(_list)
>   smr_read_enter();
>   SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) {
>   ifp0 = p->p_ifp0;
>   if (!ISSET(ifp0->if_flags, IFF_RUNNING))
>   continue;
>  
> - m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
> - if (m == NULL) {
> - /* XXX count error */
> - continue;
> - }
> + sp = pool_get(_port_pool, PR_NOWAIT);
> + if (sp == NULL)
> + continue;   /* XXX count error */
>  
> - if_enqueue(ifp0, m); /* XXX count error */
> + veb_eb_brport_take(p);
> + sp->sp_port = p;
> + SLIST_INSERT_HEAD(_list, sp, sp_entry);
>   }
>   smr_read_leave();
> +
> + while (!SLIST_EMPTY(_list)) {
> + sp = SLIST_FIRST(_list);
> + SLIST_REMOVE_HEAD(_list, sp_entry);
> +
> + m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
> + if (m != NULL)
> + if_enqueue(sp->sp_port->p_ifp0, m);
> + /* XXX count error */
> +
> + veb_eb_brport_rele(sp->sp_port);
> + pool_put(_port_pool, sp);
> + }
>  }
>  
>  static int
> 

-- 
:wq Claudio

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-11 Thread Claudio Jeker

On Wed, May 11, 2022 at 12:38:56AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> > 
> > Yes.  It is similar.
> > 
> > I have read the whole mail thread and the final fix got commited.
> > But it looks incomplete, pf is still sleeping.
> > 
> > Hrvoje, can you run the tests again that triggered the panics a
> > year ago?
> > 
> > Sasha, I still think the way to go is mutex for pf locks.  I don't
> > see a performance impact.
> 
> you mean performance impact against 7.1? If it is the case,
> then I agree.
> 
> > a single CPU anyway.  And with sleeping locks you have to schedule
> > in the hot packet path.  Our schedueler was never build for that.
> 
> I don't think it is a problem of scheduler. I rather see it as problem of
> lock primitives, which can't keep consistency across a switch from CPU
> (a.k.a.  sleep).
> > 
> > At genua we started with mutex, made it fine grained, and converted
> > to rcu later.
> > 
> 
> I think the only reason why PF_LOCK() is a writer lock is overload tables.
> packet, which is about to update an overload table requires an exclusive
> PF_LOCK(). As soon as there will be a way to update overload table without
> an exclusive lock, then we can turn packets to PF_LOCK() readers.
> 
> Also I somewhat disagree with idea 'network stack must never sleep'. The
> rw-lock adds places to network subsystem, where scheduler has a chance to
> give CPU to someone else. 
> 
> on the other hand let's stay real. if we want to keep at least part of the
> network stack running in parallel these days, then pf locks need to be turn to
> mutexes unless we want to hunt down all those SMR_READ sections and
> fix/workaround them to deal with sleep in underlying call-stack.
> 
> I think either way (putting mutexes in vs. hunting down sleeping SMR section)
> is fine for me. My current plan is to move all those memory allocations
> in ioctl(2) path outside of PF_LOCK() scope.

I do not fully agree with this. SMR sections should be short and not span
deep callstacks. This is not a good design and finding and fixing these
errors is important.

-- 
:wq Claudio

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Alexandr Nedvedicky

Hello,

> 
> Yes.  It is similar.
> 
> I have read the whole mail thread and the final fix got commited.
> But it looks incomplete, pf is still sleeping.
> 
> Hrvoje, can you run the tests again that triggered the panics a
> year ago?
> 
> Sasha, I still think the way to go is mutex for pf locks.  I don't
> see a performance impact.

you mean performance impact against 7.1? If it is the case,
then I agree.

> a single CPU anyway.  And with sleeping locks you have to schedule
> in the hot packet path.  Our schedueler was never build for that.

I don't think it is a problem of scheduler. I rather see it as problem of
lock primitives, which can't keep consistency across a switch from CPU
(a.k.a.  sleep).
> 
> At genua we started with mutex, made it fine grained, and converted
> to rcu later.
> 

I think the only reason why PF_LOCK() is a writer lock is overload tables.
packet, which is about to update an overload table requires an exclusive
PF_LOCK(). As soon as there will be a way to update overload table without
an exclusive lock, then we can turn packets to PF_LOCK() readers.

Also I somewhat disagree with idea 'network stack must never sleep'. The
rw-lock adds places to network subsystem, where scheduler has a chance to
give CPU to someone else. 

on the other hand let's stay real. if we want to keep at least part of the
network stack running in parallel these days, then pf locks need to be turn to
mutexes unless we want to hunt down all those SMR_READ sections and
fix/workaround them to deal with sleep in underlying call-stack.

I think either way (putting mutexes in vs. hunting down sleeping SMR section)
is fine for me. My current plan is to move all those memory allocations
in ioctl(2) path outside of PF_LOCK() scope.

thanks and
regards
sashan

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Alexander Bluhm

On Tue, May 10, 2022 at 09:37:12PM +0200, Hrvoje Popovski wrote:
> On 9.5.2022. 22:04, Alexander Bluhm wrote:
> > Can some veb or smr hacker explain how this is supposed to work?
> > 
> > Sleeping in pf is also not ideal as it is in the hot path and slows
> > down packets.  But that is not easy to fix as we have to refactor
> > the memory allocations before converting pf lock to a mutex.  sashan@
> > is working on that.
> 
> 
> Hi,
> 
> isn't that similar or same panic that was talked about in "parallel
> forwarding vs. bridges" mail thread on tech@ started by sashan@
> 
> https://www.mail-archive.com/tech@openbsd.org/msg64040.html

Yes.  It is similar.  

I have read the whole mail thread and the final fix got commited.
But it looks incomplete, pf is still sleeping.

Hrvoje, can you run the tests again that triggered the panics a
year ago?

Sasha, I still think the way to go is mutex for pf locks.  I don't
see a performance impact.  Without it, we can run network only on
a single CPU anyway.  And with sleeping locks you have to schedule
in the hot packet path.  Our schedueler was never build for that.

At genua we started with mutex, made it fine grained, and converted
to rcu later.

bluhm

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Alexandr Nedvedicky

Hello,

On Tue, May 10, 2022 at 09:37:12PM +0200, Hrvoje Popovski wrote:
> On 9.5.2022. 22:04, Alexander Bluhm wrote:
> > Can some veb or smr hacker explain how this is supposed to work?
> > 
> > Sleeping in pf is also not ideal as it is in the hot path and slows
> > down packets.  But that is not easy to fix as we have to refactor
> > the memory allocations before converting pf lock to a mutex.  sashan@
> > is working on that.
> 

rw-lock in pf is indeed a next step. The first step is
to move memory allocations done by ioctl(2) out of network path.
same goes to copyin()/copyout().

I'm currently looking at art tables. I theory it should be possible
to replace current radix tables with art tables. If I understand
things right, this would also solve locking on update.

> 
> Hi,
> 
> isn't that similar or same panic that was talked about in "parallel
> forwarding vs. bridges" mail thread on tech@ started by sashan@
> 
> https://www.mail-archive.com/tech@openbsd.org/msg64040.html
> 

this is very similar to panics you've found. However this
time packets sneak to sleep path via call to veb_span().

I agree the fix looks odd. May be another way around it
would be to add a task to every veb switch. The task will
dispatch packets to slow path.

regards
sashan

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Vitaliy Makkoveev

> On 10 May 2022, at 22:37, Hrvoje Popovski  wrote:
> 
> On 9.5.2022. 22:04, Alexander Bluhm wrote:
>> Can some veb or smr hacker explain how this is supposed to work?
>> 
>> Sleeping in pf is also not ideal as it is in the hot path and slows
>> down packets.  But that is not easy to fix as we have to refactor
>> the memory allocations before converting pf lock to a mutex.  sashan@
>> is working on that.
> 
> 
> Hi,
> 
> isn't that similar or same panic that was talked about in "parallel
> forwarding vs. bridges" mail thread on tech@ started by sashan@
> 
> https://www.mail-archive.com/tech@openbsd.org/msg64040.html
> 
> 

Yes, both panics invocated by sleep attempt within SMR section.

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Hrvoje Popovski

On 9.5.2022. 22:04, Alexander Bluhm wrote:
> Can some veb or smr hacker explain how this is supposed to work?
> 
> Sleeping in pf is also not ideal as it is in the hot path and slows
> down packets.  But that is not easy to fix as we have to refactor
> the memory allocations before converting pf lock to a mutex.  sashan@
> is working on that.


Hi,

isn't that similar or same panic that was talked about in "parallel
forwarding vs. bridges" mail thread on tech@ started by sashan@

https://www.mail-archive.com/tech@openbsd.org/msg64040.html

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-10 Thread Barbaros Bilek

Thanks for your support. I'll try to test when you get it done.

On Mon, May 9, 2022 at 8:51 PM Alexandr Nedvedicky <
alexandr.nedvedi...@oracle.com> wrote:

> Hello Barbaros,
>
> thank you for testing and excellent report.
>
> 
>
> > ddb{1}> trace
> > db_enter() at db_enter+0x10
> > panic(81f22e39) at panic+0xbf
> > __assert(81f96c9d,81f85ebc,a3,81fd252f) at
> __assert+0x25
> > assertwaitok() at assertwaitok+0xcc
> > mi_switch() at mi_switch+0x40
>
> assert indicates we attempt to sleep inside SMR section,
> which must be avoided.
>
> > sleep_finish(800025574da0,1) at sleep_finish+0x10b
> > rw_enter(822cfe50,1) at rw_enter+0x1cb
> > pf_test(2,1,8520e000,800025575058) at pf_test+0x1088
> > ip_input_if(800025575058,800025575064,4,0,8520e000) at
> ip_input_if+0xcd
> > ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39
> > ether_input(8520e000,fd8053616700) at ether_input+0x3ad
> > vport_if_enqueue(8520e000,fd8053616700) at
> vport_if_enqueue+0x19
> >
> veb_port_input(851c3800,fd806064c200,,82066600)
> at veb_port_input+0x4d2
> > ether_input(851c3800,fd806064c200) at ether_input+0x100
> > vlan_input(8095a050,fd806064c200,8000255752bc) at
> vlan_input+0x23d
> > ether_input(8095a050,fd806064c200) at ether_input+0x85
> > if_input_process(8095a050,800025575358) at
> if_input_process+0x6f
> > ifiq_process(8095a460) at ifiq_process+0x69
> > taskq_thread(80035080) at taskq_thread+0x100
>
> above is a call stack, which has done a bad thing (sleeping SMR
> section)
>
> in my opinion the primary suspect is veb_port_input() which code reads as
> follows:
>
>  966 static struct mbuf *
>  967 veb_port_input(struct ifnet *ifp0, struct mbuf *m, uint64_t dst, void
> *brport)
>  968 {
>  969 struct veb_port *p = brport;
>  970 struct veb_softc *sc = p->p_veb;
>  971 struct ifnet *ifp = >sc_if;
>  972 struct ether_header *eh;
>  ...
> 1021 counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes,
> 1022 m->m_pkthdr.len);
> 1023
> 1024 /* force packets into the one routing domain for pf */
> 1025 m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
> 1026
> 1027 #if NBPFILTER > 0
> 1028 if_bpf = READ_ONCE(ifp->if_bpf);
> 1029 if (if_bpf != NULL) {
> 1030 if (bpf_mtap_ether(if_bpf, m, 0) != 0)
> 1031 goto drop;
> 1032 }
> 1033 #endif
> 1034
> 1035 veb_span(sc, m);
> 1036
> 1037 if (ISSET(p->p_bif_flags, IFBIF_BLOCKNONIP) &&
> 1038 veb_ip_filter(m))
> 1039 goto drop;
> 1040
> 1041 if (!ISSET(ifp->if_flags, IFF_LINK0) &&
> 1042 veb_vlan_filter(m))
> 1043 goto drop;
> 1044
> 1045 if (veb_rule_filter(p, VEB_RULE_LIST_IN, m, src, dst))
> 1046 goto drop;
>
> call to veb_span() at line 1035 seems to be our guy/culprit (in my
> opinion):
>
>  356 smr_read_enter();
>  357 SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) {
>  358 ifp0 = p->p_ifp0;
>  359 if (!ISSET(ifp0->if_flags, IFF_RUNNING))
>  360 continue;
>  361
>  362 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN,
> M_NOWAIT);
>  363 if (m == NULL) {
>  364 /* XXX count error */
>  365 continue;
>  366 }
>  367
>  368 if_enqueue(ifp0, m); /* XXX count error */
>  369 }
>  370 smr_read_leave();
>
> loop above comes from veb_span(), which calls if_enqueue() from within
> a smr section. The line 368 calls here:
>
> 2191 static int
> 2192 vport_if_enqueue(struct ifnet *ifp, struct mbuf *m)
> 2193 {
> 2194 /*
> 2195  * switching an l2 packet toward a vport means pushing it
> 2196  * into the network stack. this function exists to make
> 2197  * if_vinput compat with veb calling if_enqueue.
> 2198  */
> 2199
> 2200 if_vinput(ifp, m);
> 2201
> 2202 return (0);
> 2203 }
>
> which in turn calls if_vinput() which calls further down to ipstack, and IP
> stack my sleep. We must change veb_span() such calls to if_vinput() will
> happen
> outside of SMR section.
>
> I don't have such complex setup to use vlans and virtual ports. I'll try to
> cook some diff and pass it to you for testing.
>
> thanks again for coming back to us with report.
>
> regards
> sashan
>
>
>

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Vitaliy Makkoveev

Hi,

I’m not a fun of this.

> 
> + if (span_port_pool.pr_size == 0) {
> + pool_init(_port_pool, sizeof(struct veb_span_port),
> + 0, IPL_SOFTNET, 0, "vebspl", NULL);
> + }

Does initialized pool consume significant resources? Why don’t we
do this within vebattach(). This is also true for `veb_rule_pool’
initialization.

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Alexandr Nedvedicky

Hello,

On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote:
> Hello,
> 
> I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9.
> My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable.
> Also I've used 7.1 for a limited time and there were no crash.
> After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days.
> Here crash report and dmesg:
> 
> ether_input(8520e000,fd8053616700) at ether_input+0x3ad
> vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19
> veb_port_input(851c3800,fd806064c200,,82066600)
>  at veb_port_input+0x4d2
> ether_input(851c3800,fd806064c200) at ether_input+0x100
> end trace frame: 0x800025575290, count: 0
> ddb{1}> show panic
> 
> *cpu1: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth ==
> 0" f
> ailed: file "/usr/src/sys/kern/subr_xxx.c", line 163
> 
> ddb{1}> trace
> 
> db_enter() at db_enter+0x10
> 
> panic(81f22e39) at panic+0xbf
> 
> __assert(81f96c9d,81f85ebc,a3,81fd252f) at
> __assert+0x2
> 
> 5
> 

diff below attempts to fix this particular panic triggered by veb_span()
function. This is fairly simple/straightforward change:

we grab references to veb ports inside SMR_READ_ section.

we keep those references in single linked list

as soon as we leave SMR_READ_ section we process the list:
dispatch packets

drop references to port

The change may uncover similar panics in other veb/bridge area.

diff applies to current

thanks for testing and reporting back.

regards
sashan

8<---8<---8<--8<
diff --git a/sys/net/if_veb.c b/sys/net/if_veb.c
index 2976cc200f1..a02dbac887f 100644
--- a/sys/net/if_veb.c
+++ b/sys/net/if_veb.c
@@ -159,6 +159,11 @@ struct veb_softc {
struct veb_ports sc_spans;
 };
 
+struct veb_span_port {
+   SLIST_ENTRY(veb_span_port)   sp_entry;
+   struct veb_port *sp_port;
+};
+
 #define DPRINTF(_sc, fmt...)do { \
if (ISSET((_sc)->sc_if.if_flags, IFF_DEBUG)) \
printf(fmt); \
@@ -225,6 +230,7 @@ static struct if_clone veb_cloner =
 IF_CLONE_INITIALIZER("veb", veb_clone_create, veb_clone_destroy);
 
 static struct pool veb_rule_pool;
+static struct pool span_port_pool;
 
 static int vport_clone_create(struct if_clone *, int);
 static int vport_clone_destroy(struct ifnet *);
@@ -266,6 +272,11 @@ veb_clone_create(struct if_clone *ifc, int unit)
0, IPL_SOFTNET, 0, "vebrpl", NULL);
}
 
+   if (span_port_pool.pr_size == 0) {
+   pool_init(_port_pool, sizeof(struct veb_span_port),
+   0, IPL_SOFTNET, 0, "vebspl", NULL);
+   }
+
sc = malloc(sizeof(*sc), M_DEVBUF, M_WAITOK|M_ZERO|M_CANFAIL);
if (sc == NULL)
return (ENOMEM);
@@ -352,22 +363,38 @@ veb_span(struct veb_softc *sc, struct mbuf *m0)
struct veb_port *p;
struct ifnet *ifp0;
struct mbuf *m;
+   struct veb_span_port *sp;
+   SLIST_HEAD(, veb_span_port) span_list;
 
+   SLIST_INIT(_list)
smr_read_enter();
SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) {
ifp0 = p->p_ifp0;
if (!ISSET(ifp0->if_flags, IFF_RUNNING))
continue;
 
-   m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
-   if (m == NULL) {
-   /* XXX count error */
-   continue;
-   }
+   sp = pool_get(_port_pool, PR_NOWAIT);
+   if (sp == NULL)
+   continue;   /* XXX count error */
 
-   if_enqueue(ifp0, m); /* XXX count error */
+   veb_eb_brport_take(p);
+   sp->sp_port = p;
+   SLIST_INSERT_HEAD(_list, sp, sp_entry);
}
smr_read_leave();
+
+   while (!SLIST_EMPTY(_list)) {
+   sp = SLIST_FIRST(_list);
+   SLIST_REMOVE_HEAD(_list, sp_entry);
+
+   m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
+   if (m != NULL)
+   if_enqueue(sp->sp_port->p_ifp0, m);
+   /* XXX count error */
+
+   veb_eb_brport_rele(sp->sp_port);
+   pool_put(_port_pool, sp);
+   }
 }
 
 static int

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Alexander Bluhm

On Mon, May 09, 2022 at 06:01:07PM +0300, Barbaros Bilek wrote:
> I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9.
> My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable.
> Also I've used 7.1 for a limited time and there were no crash.
> After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days.

For me this looks like a bug in veb(4).

> ddb{1}> trace
> db_enter() at db_enter+0x10
> panic(81f22e39) at panic+0xbf
> __assert(81f96c9d,81f85ebc,a3,81fd252f) at 
> __assert+0x25
> assertwaitok() at assertwaitok+0xcc
> mi_switch() at mi_switch+0x40
> sleep_finish(800025574da0,1) at sleep_finish+0x10b
> rw_enter(822cfe50,1) at rw_enter+0x1cb
> pf_test(2,1,8520e000,800025575058) at pf_test+0x1088
> ip_input_if(800025575058,800025575064,4,0,8520e000) at 
> ip_input_if+0xcd
> ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39
> ether_input(8520e000,fd8053616700) at ether_input+0x3ad
> vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19
> veb_port_input(851c3800,fd806064c200,,82066600)
>  at veb_port_input+0x4d2
> ether_input(851c3800,fd806064c200) at ether_input+0x100
> vlan_input(8095a050,fd806064c200,8000255752bc) at 
> vlan_input+0x23d
> ether_input(8095a050,fd806064c200) at ether_input+0x85
> if_input_process(8095a050,800025575358) at if_input_process+0x6f
> ifiq_process(8095a460) at ifiq_process+0x69
> taskq_thread(80035080) at taskq_thread+0x100

veb_port_input -> veb_broadcast -> smr_read_enter; tp->p_enqueue
-> vport_if_enqueue -> if_vinput -> ifp->if_input -> ether_input ->
ipv4_input -> ip_input_if -> pf_test -> PF_LOCK -> rw_enter_write()

After calling smr_read_enter sleeping is not allowed according to
man page.  pf sleeps because it uses a read write lock.  I looks
like we have some contention on the pf lock.  With more forwarding
threads, sleep in pf is more likely.

> __mp_lock(823d986c) at __mp_lock+0x72
> wakeup_n(822cfe50,) at wakeup_n+0x32
> pf_test(2,2,80948050,80002557b300) at pf_test+0x11f6
> pf_route(80002557b388,fd89fb379938) at pf_route+0x1f6
> pf_test(2,1,80924050,80002557b598) at pf_test+0xa1f
> ip_input_if(80002557b598,80002557b5a4,4,0,80924050) at 
> ip_input_if+0xcd
> ipv4_input(80924050,fd8053540f00) at ipv4_input+0x39
> ether_input(80924050,fd8053540f00) at ether_input+0x3ad
> if_input_process(80924050,80002557b688) at if_input_process+0x6f
> ifiq_process(80926500) at ifiq_process+0x69
> taskq_thread(80035100) at taskq_thread+0x100

> __mp_lock(823d986c) at __mp_lock+0x72
> wakeup_n(822cfe50,) at wakeup_n+0x32
> pf_test(2,2,80948050,80002557b300) at pf_test+0x11f6
> pf_route(80002557b388,fd89fb379938) at pf_route+0x1f6
> pf_test(2,1,80924050,80002557b598) at pf_test+0xa1f
> ip_input_if(80002557b598,80002557b5a4,4,0,80924050) at 
> ip_input_if+0xcd
> ipv4_input(80924050,fd8053540f00) at ipv4_input+0x39
> ether_input(80924050,fd8053540f00) at ether_input+0x3ad
> if_input_process(80924050,80002557b688) at if_input_process+0x6f
> ifiq_process(80926500) at ifiq_process+0x69
> taskq_thread(80035100) at taskq_thread+0x100

Can some veb or smr hacker explain how this is supposed to work?

Sleeping in pf is also not ideal as it is in the hot path and slows
down packets.  But that is not easy to fix as we have to refactor
the memory allocations before converting pf lock to a mutex.  sashan@
is working on that.

bluhm

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Alexandr Nedvedicky

Hello Barbaros,

thank you for testing and excellent report.



> ddb{1}> trace
> db_enter() at db_enter+0x10
> panic(81f22e39) at panic+0xbf
> __assert(81f96c9d,81f85ebc,a3,81fd252f) at 
> __assert+0x25
> assertwaitok() at assertwaitok+0xcc
> mi_switch() at mi_switch+0x40

assert indicates we attempt to sleep inside SMR section,
which must be avoided.

> sleep_finish(800025574da0,1) at sleep_finish+0x10b
> rw_enter(822cfe50,1) at rw_enter+0x1cb
> pf_test(2,1,8520e000,800025575058) at pf_test+0x1088
> ip_input_if(800025575058,800025575064,4,0,8520e000) at 
> ip_input_if+0xcd
> ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39
> ether_input(8520e000,fd8053616700) at ether_input+0x3ad
> vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19
> veb_port_input(851c3800,fd806064c200,,82066600)
>  at veb_port_input+0x4d2
> ether_input(851c3800,fd806064c200) at ether_input+0x100
> vlan_input(8095a050,fd806064c200,8000255752bc) at 
> vlan_input+0x23d
> ether_input(8095a050,fd806064c200) at ether_input+0x85
> if_input_process(8095a050,800025575358) at if_input_process+0x6f
> ifiq_process(8095a460) at ifiq_process+0x69
> taskq_thread(80035080) at taskq_thread+0x100

above is a call stack, which has done a bad thing (sleeping SMR section)

in my opinion the primary suspect is veb_port_input() which code reads as
follows:

 966 static struct mbuf *
 967 veb_port_input(struct ifnet *ifp0, struct mbuf *m, uint64_t dst, void 
*brport)
 968 {
 969 struct veb_port *p = brport;
 970 struct veb_softc *sc = p->p_veb;
 971 struct ifnet *ifp = >sc_if;
 972 struct ether_header *eh;
 ...
1021 counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes,
1022 m->m_pkthdr.len);
1023 
1024 /* force packets into the one routing domain for pf */
1025 m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
1026 
1027 #if NBPFILTER > 0
1028 if_bpf = READ_ONCE(ifp->if_bpf);
1029 if (if_bpf != NULL) {
1030 if (bpf_mtap_ether(if_bpf, m, 0) != 0)
1031 goto drop;
1032 }
1033 #endif
1034 
1035 veb_span(sc, m);
1036 
1037 if (ISSET(p->p_bif_flags, IFBIF_BLOCKNONIP) &&
1038 veb_ip_filter(m))
1039 goto drop;
1040 
1041 if (!ISSET(ifp->if_flags, IFF_LINK0) &&
1042 veb_vlan_filter(m))
1043 goto drop;
1044 
1045 if (veb_rule_filter(p, VEB_RULE_LIST_IN, m, src, dst))
1046 goto drop;

call to veb_span() at line 1035 seems to be our guy/culprit (in my opinion):

 356 smr_read_enter();
 357 SMR_TAILQ_FOREACH(p, >sc_spans.l_list, p_entry) {
 358 ifp0 = p->p_ifp0;
 359 if (!ISSET(ifp0->if_flags, IFF_RUNNING))
 360 continue;
 361 
 362 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
 363 if (m == NULL) {
 364 /* XXX count error */
 365 continue;
 366 }
 367 
 368 if_enqueue(ifp0, m); /* XXX count error */
 369 }
 370 smr_read_leave();

loop above comes from veb_span(), which calls if_enqueue() from within
a smr section. The line 368 calls here:

2191 static int
2192 vport_if_enqueue(struct ifnet *ifp, struct mbuf *m)
2193 {
2194 /*
2195  * switching an l2 packet toward a vport means pushing it
2196  * into the network stack. this function exists to make
2197  * if_vinput compat with veb calling if_enqueue.
2198  */
2199 
2200 if_vinput(ifp, m);
2201
2202 return (0);
2203 }  

which in turn calls if_vinput() which calls further down to ipstack, and IP
stack my sleep. We must change veb_span() such calls to if_vinput() will happen
outside of SMR section.

I don't have such complex setup to use vlans and virtual ports. I'll try to
cook some diff and pass it to you for testing.

thanks again for coming back to us with report.

regards
sashan

7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-09 Thread Barbaros Bilek

Hello,

I was using veb (veb+vlan+ixl) interfaces quite stable since 6.9.
My system ran as a firewall under OpenBSD 6.9 and 7.0 quite stable.
Also I've used 7.1 for a limited time and there were no crash.
After OpenBSD' NET_TASKQ upgrade to 4 it crashed after 5 days.
Here crash report and dmesg:

ether_input(8520e000,fd8053616700) at ether_input+0x3ad

vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19

veb_port_input(851c3800,fd806064c200,,82066600)

 at veb_port_input+0x4d2

ether_input(851c3800,fd806064c200) at ether_input+0x100

end trace frame: 0x800025575290, count: 0

https://www.openbsd.org/ddb.html describes the minimum info required in bug

reports.  Insufficient info makes it difficult to find and fix bugs.

ddb{1}> show panic

*cpu1: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth ==
0" f

ailed: file "/usr/src/sys/kern/subr_xxx.c", line 163

ddb{1}> trace

db_enter() at db_enter+0x10

panic(81f22e39) at panic+0xbf

__assert(81f96c9d,81f85ebc,a3,81fd252f) at
__assert+0x2

5

assertwaitok() at assertwaitok+0xcc

mi_switch() at mi_switch+0x40

sleep_finish(800025574da0,1) at sleep_finish+0x10b

rw_enter(822cfe50,1) at rw_enter+0x1cb

pf_test(2,1,8520e000,800025575058) at pf_test+0x1088

ip_input_if(800025575058,800025575064,4,0,8520e000) at
ip_input

_if+0xcd

ipv4_input(8520e000,fd8053616700) at ipv4_input+0x39

ether_input(8520e000,fd8053616700) at ether_input+0x3ad

vport_if_enqueue(8520e000,fd8053616700) at vport_if_enqueue+0x19

veb_port_input(851c3800,fd806064c200,,82066600)

 at veb_port_input+0x4d2

ether_input(851c3800,fd806064c200) at ether_input+0x100

vlan_input(8095a050,fd806064c200,8000255752bc) at
vlan_input+0x

23d

ether_input(8095a050,fd806064c200) at ether_input+0x85

if_input_process(8095a050,800025575358) at if_input_process+0x6f

ifiq_process(8095a460) at ifiq_process+0x69

taskq_thread(80035080) at taskq_thread+0x100

end trace frame: 0x0, count: -19

ddb{1}> ps /o

TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND

 422021  80579  0 0x2  07  ifconfig

 292011   89065020x12  0x4008  mariadbd

 427181   89065020x12  0x4006K mariadbd

  86788   89065020x12  0x4003  mariadbd

 302453  98158  0 0x14000  0x2009  softnet

  88346  66890  0 0x14000  0x2005  softnet

ddb{1}> machine ddbcpu 2

Stopped at  x86_ipi_db+0x12:leave

x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

acpicpu_idle() at acpicpu_idle+0x203

sched_idle(80001d1c3ff0) at sched_idle+0x280

end trace frame: 0x0, count: 10

ddb{2}> trace

x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

acpicpu_idle() at acpicpu_idle+0x203

sched_idle(80001d1c3ff0) at sched_idle+0x280

end trace frame: 0x0, count: -5

ddb{2}> machine ddbcpu 2

Invalid cpu 2

ddb{2}> t[A[A

Bad character

x86_ipi_db(80001d1c3ff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

acpicpu_idle() at acpicpu_idle+0x203

sched_idle(80001d1c3ff0) at sched_idle+0x280

end trace frame: 0x0, count: -5

ddb{2}> machine ddbcpu 3

Stopped at  x86_ipi_db+0x12:leave

x86_ipi_db(80001d1ccff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

__mp_lock(823d986c) at __mp_lock+0x72

wakeup_n(8000fffeca88,1) at wakeup_n+0x32

futex_requeue(c13fb4a32e0,1,0,0,2) at futex_requeue+0xe4

sys_futex(8000fffc2008,8000265ca780,8000265ca7d0) at
sys_futex+0xe6


syscall(8000265ca840) at syscall+0x374

Xsyscall() at Xsyscall+0x128

end of kernel

end trace frame: 0xc13f5b4b090, count: 6

ddb{3}> trace

x86_ipi_db(80001d1ccff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

__mp_lock(823d986c) at __mp_lock+0x72

wakeup_n(8000fffeca88,1) at wakeup_n+0x32

futex_requeue(c13fb4a32e0,1,0,0,2) at futex_requeue+0xe4

sys_futex(8000fffc2008,8000265ca780,8000265ca7d0) at
sys_futex+0xe6


syscall(8000265ca840) at syscall+0x374

Xsyscall() at Xsyscall+0x128

end of kernel

end trace frame: 0xc13f5b4b090, count: -9

ddb{3}> machine ddbcpu 4

Stopped at  x86_ipi_db+0x12:leave

x86_ipi_db(80001d1d5ff0) at x86_ipi_db+0x12

x86_ipi_handler() at x86_ipi_handler+0x80

Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23

acpicpu_idle() at acpicpu_idle+0x203

sched_idle(80001d1d5ff0) at

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

Re: [External] : 7.1-Current crash with NET_TASKQ 4 and veb interface

7.1-Current crash with NET_TASKQ 4 and veb interface

31 matches

Site Navigation

Mail list logo

Footer information