Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread David Gwynne
On Thu, May 12, 2022 at 08:07:09PM +0200, Hrvoje Popovski wrote:
> On 12.5.2022. 20:04, Hrvoje Popovski wrote:
> > On 12.5.2022. 16:22, Hrvoje Popovski wrote:
> >> On 12.5.2022. 14:48, Claudio Jeker wrote:
> >>> I think the diff below may be enough to fix this issue. It drops the SMR
> >>> critical secition around the enqueue operation but uses a reference on the
> >>> port insteadt to ensure that the device can't be removed during the
> >>> enqueue. Once the enqueue is finished we enter the SMR critical section
> >>> again and drop the reference.
> >>>
> >>> To make it clear that the SMR_TAILQ remains intact while a refcount is
> >>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
> >>> not strictly needed since the next pointer remains valid up until the
> >>> smr_barrier() call but I find this a bit easier to understand.
> >>> First make sure nobody else holds a reference then remove the port from
> >>> the list.
> >>>
> >>> I currently have no test setup to verify this but I hope someone else can
> >>> give this a spin.
> >> Hi,
> >>
> >> for now veb seems stable and i can't panic box although it should, but
> >> please give me few more hours to torture it properly.
> > 
> > 
> > I can trigger panic in veb with this diff.
> > 
> > Thank you ..
> > 
> > 
> 
> I can't trigger ... :))) sorry ..

sorry i'm late to the party. can you try this diff?

this diff replaces the list of ports with an array/map of ports.
the map takes references to all the ports, so the forwarding paths
just have to hold a reference to the map to be able to use all the
ports. the forwarding path uses smr to get hold of a map, takes a map
ref, and then leaves the smr crit section before iterating over the map
and pushing packets.

this means we should only take and release a single refcnt when
we're pushing packets out any number of ports.

if no span ports are configured, then there's no span port map and
we don't try and take a ref, we can just return early.

we also only take and release a single refcnt when we forward the
actual packet. forwarding to a single port provided by an etherbridge
lookup already takes/releases the single port ref. if it falls
through that for unknown unicast or broadcast/multicast, then it's
a single refcnt for the current map of all ports.

Index: if_veb.c
===
RCS file: /cvs/src/sys/net/if_veb.c,v
retrieving revision 1.25
diff -u -p -r1.25 if_veb.c
--- if_veb.c4 Jan 2022 06:32:39 -   1.25
+++ if_veb.c13 May 2022 02:01:43 -
@@ -139,13 +139,13 @@ struct veb_port {
struct veb_rule_list p_vr_list[2];
 #define VEB_RULE_LIST_OUT  0
 #define VEB_RULE_LIST_IN   1
-
-   SMR_TAILQ_ENTRY(veb_port)p_entry;
 };
 
 struct veb_ports {
-   SMR_TAILQ_HEAD(, veb_port)   l_list;
-   unsigned int l_count;
+   struct refcntm_refs;
+   unsigned int m_count;
+
+   /* followed by an array of veb_port pointers */
 };
 
 struct veb_softc {
@@ -155,8 +155,8 @@ struct veb_softc {
struct etherbridge   sc_eb;
 
struct rwlocksc_rule_lock;
-   struct veb_ports sc_ports;
-   struct veb_ports sc_spans;
+   struct veb_ports*sc_ports;
+   struct veb_ports*sc_spans;
 };
 
 #define DPRINTF(_sc, fmt...)do { \
@@ -184,8 +184,25 @@ static int veb_p_ioctl(struct ifnet *, u
 static int veb_p_output(struct ifnet *, struct mbuf *,
struct sockaddr *, struct rtentry *);
 
-static voidveb_p_dtor(struct veb_softc *, struct veb_port *,
-   const char *);
+static inline size_t
+veb_ports_size(unsigned int n)
+{
+   /* use of _ALIGN is inspired by CMSGs */
+   return _ALIGN(sizeof(struct veb_ports)) +
+   n * sizeof(struct veb_port *);
+}
+
+static inline struct veb_port **
+veb_ports_array(struct veb_ports *m)
+{
+   return (struct veb_port **)((caddr_t)m + _ALIGN(sizeof(*m)));
+}
+
+static voidveb_ports_free(struct veb_ports *);
+
+static voidveb_p_unlink(struct veb_softc *, struct veb_port *);
+static voidveb_p_fini(struct veb_port *);
+static voidveb_p_dtor(struct veb_softc *, struct veb_port *);
 static int veb_add_port(struct veb_softc *,
const struct ifbreq *, unsigned int);
 static int veb_del_port(struct veb_softc *,
@@ -271,8 +288,8 @@ veb_clone_create(struct if_clone *ifc, i
return (ENOMEM);
 
rw_init(>sc_rule_lock, "vebrlk");
-   SMR_TAILQ_INIT(>sc_ports.l_list);
-   SMR_TAILQ_INIT(>sc_spans.l_list);
+   sc->sc_ports = NULL;
+   sc->sc_spans = NULL;
 
ifp = >sc_if;
 
@@ -314,7 +331,10 @@ static int
 veb_clone_destroy(struct ifnet *ifp)
 {
struct veb_softc *sc = ifp->if_softc;
-   struct 

Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Hrvoje Popovski
On 12.5.2022. 20:04, Hrvoje Popovski wrote:
> On 12.5.2022. 16:22, Hrvoje Popovski wrote:
>> On 12.5.2022. 14:48, Claudio Jeker wrote:
>>> I think the diff below may be enough to fix this issue. It drops the SMR
>>> critical secition around the enqueue operation but uses a reference on the
>>> port insteadt to ensure that the device can't be removed during the
>>> enqueue. Once the enqueue is finished we enter the SMR critical section
>>> again and drop the reference.
>>>
>>> To make it clear that the SMR_TAILQ remains intact while a refcount is
>>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
>>> not strictly needed since the next pointer remains valid up until the
>>> smr_barrier() call but I find this a bit easier to understand.
>>> First make sure nobody else holds a reference then remove the port from
>>> the list.
>>>
>>> I currently have no test setup to verify this but I hope someone else can
>>> give this a spin.
>> Hi,
>>
>> for now veb seems stable and i can't panic box although it should, but
>> please give me few more hours to torture it properly.
> 
> 
> I can trigger panic in veb with this diff.
> 
> Thank you ..
> 
> 

I can't trigger ... :))) sorry ..



Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Hrvoje Popovski
On 12.5.2022. 16:22, Hrvoje Popovski wrote:
> On 12.5.2022. 14:48, Claudio Jeker wrote:
>> I think the diff below may be enough to fix this issue. It drops the SMR
>> critical secition around the enqueue operation but uses a reference on the
>> port insteadt to ensure that the device can't be removed during the
>> enqueue. Once the enqueue is finished we enter the SMR critical section
>> again and drop the reference.
>>
>> To make it clear that the SMR_TAILQ remains intact while a refcount is
>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
>> not strictly needed since the next pointer remains valid up until the
>> smr_barrier() call but I find this a bit easier to understand.
>> First make sure nobody else holds a reference then remove the port from
>> the list.
>>
>> I currently have no test setup to verify this but I hope someone else can
>> give this a spin.
> Hi,
> 
> for now veb seems stable and i can't panic box although it should, but
> please give me few more hours to torture it properly.


I can trigger panic in veb with this diff.

Thank you ..




Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Alexandr Nedvedicky
Hello,

> 
> I think the diff below may be enough to fix this issue. It drops the SMR
> critical secition around the enqueue operation but uses a reference on the
> port insteadt to ensure that the device can't be removed during the
> enqueue. Once the enqueue is finished we enter the SMR critical section
> again and drop the reference.
> 
> To make it clear that the SMR_TAILQ remains intact while a refcount is
> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
> not strictly needed since the next pointer remains valid up until the
> smr_barrier() call but I find this a bit easier to understand.
> First make sure nobody else holds a reference then remove the port from
> the list.
> 
> I currently have no test setup to verify this but I hope someone else can
> give this a spin.

I agree this diff is far better than mine version. I have not
realized we can actually do a dance with
smr_read_enter()/smr_read_leave(). I believe it should work and
fix a problem. Here is my OK (if my OK counts here).

OK sashan

> -- 
> :wq Claudio
> 
> Index: if_veb.c
> ===
> RCS file: /cvs/src/sys/net/if_veb.c,v
> retrieving revision 1.25
> diff -u -p -r1.25 if_veb.c
> --- if_veb.c  4 Jan 2022 06:32:39 -   1.25
> +++ if_veb.c  12 May 2022 12:06:28 -
> @@ -365,7 +365,11 @@ veb_span(struct veb_softc *sc, struct mb
>   continue;
>   }
>  
> + veb_eb_brport_take(p);
> + smr_read_leave();
>   if_enqueue(ifp0, m); /* XXX count error */
> + smr_read_enter();
> + veb_eb_brport_rele(p);
>   }
>   smr_read_leave();
>  }
> @@ -904,7 +908,12 @@ veb_broadcast(struct veb_softc *sc, stru
>   continue;
>   }
>  
> +
> + veb_eb_brport_take(tp);
> + smr_read_leave();
>   (*tp->p_enqueue)(ifp0, m); /* XXX count error */
> + smr_read_enter();
> + veb_eb_brport_rele(tp);
>   }
>   smr_read_leave();
>  
> @@ -1934,10 +1943,11 @@ veb_p_dtor(struct veb_softc *sc, struct 
>  
>   port_list = >sc_ports;
>   }
> + refcnt_finalize(>p_refs, "vebpdtor");
> +
>   SMR_TAILQ_REMOVE_LOCKED(_list->l_list, p, p_entry);
>   port_list->l_count--;
>  
> - refcnt_finalize(>p_refs, "vebpdtor");
>   smr_barrier();
>  
>   veb_rule_list_free(TAILQ_FIRST(>p_vrl));



Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Hrvoje Popovski
On 12.5.2022. 14:48, Claudio Jeker wrote:
> I think the diff below may be enough to fix this issue. It drops the SMR
> critical secition around the enqueue operation but uses a reference on the
> port insteadt to ensure that the device can't be removed during the
> enqueue. Once the enqueue is finished we enter the SMR critical section
> again and drop the reference.
> 
> To make it clear that the SMR_TAILQ remains intact while a refcount is
> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
> not strictly needed since the next pointer remains valid up until the
> smr_barrier() call but I find this a bit easier to understand.
> First make sure nobody else holds a reference then remove the port from
> the list.
> 
> I currently have no test setup to verify this but I hope someone else can
> give this a spin.

Hi,

for now veb seems stable and i can't panic box although it should, but
please give me few more hours to torture it properly.

I'm doing this in loop
ifconfig veb1 destroy
sh /etc/netstart
ifconfig veb0 destroy
sh /etc/netstart
ifconfig vport1 destroy
sh /etc/netstart
ifconfig vport0 destroy
sh /etc/netstart



my config

veb1: flags=a843
index 25 llprio 3
groups: veb
ix1 flags=3
port 2 ifpriority 0 ifcost 0
vport1 flags=3
port 27 ifpriority 0 ifcost 0
veb0: flags=a843
index 26 llprio 3
groups: veb
ix0 flags=3
port 1 ifpriority 0 ifcost 0
vport0 flags=3
port 28 ifpriority 0 ifcost 0
ix2 flags=100
vport1: flags=8943 mtu 1500
lladdr ec:f4:bb:da:f7:fa
index 27 priority 0 llprio 3
groups: vport
inet 192.168.111.11 netmask 0xff00 broadcast 192.168.111.255
vport0: flags=8943 mtu 1500
lladdr ec:f4:bb:da:f7:f8
index 28 priority 0 llprio 3
groups: vport
inet 192.168.100.11 netmask 0xff00 broadcast 192.168.100.255



Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface

2022-05-12 Thread Claudio Jeker
On Wed, May 11, 2022 at 11:01:21AM +0200, Alexandr Nedvedicky wrote:
> Hello Hrvoje,
> 
> thank you for testing.
> On Wed, May 11, 2022 at 10:40:28AM +0200, Hrvoje Popovski wrote:
> > On 10.5.2022. 22:55, Alexander Bluhm wrote:
> > > Yes.  It is similar.
> > > 
> > > I have read the whole mail thread and the final fix got commited.
> > > But it looks incomplete, pf is still sleeping.
> > > 
> > > Hrvoje, can you run the tests again that triggered the panics a
> > > year ago?
> > 
> > Hi,
> > 
> > year ago panics was triggered when veb or tpmr bridged traffic. I've
> > tried that right now and I couldn't trigger that panics.
> > If I put vport and route traffic over veb than I can trigger panic with
> > or without vlans as child-iface for veb.
> > For panic i need to have pf enabled and to run
> > ifconfig veb destroy or ifconfig vlan destroy and sh netstart in loop.
> > 
> > 
> > this is panic without vlans
> > 
> > ddb{4}> show all locks
> > CPU 4:
> > exclusive sched_lock _lock r = 0 (0x824819c0)
> > #0  witness_lock+0x311
> > #1  sleep_setup+0xa5
> > #2  rw_enter+0x211
> > #3  pf_test+0xcf0
> > #4  ip_output+0x6b7
> > #5  ip_forward+0x2da
> > #6  ip_input_if+0x353
> > #7  ipv4_input+0x39
> > #8  ether_input+0x3ad
> > #9  vport_if_enqueue+0x19
> > #10 veb_port_input+0x529
> 
> I suspect we deal with broadcast and this time things start
> to go downhill here in veb_broadcast():
> 
>  875 smr_read_enter();
>  876 SMR_TAILQ_FOREACH(tp, >sc_ports.l_list, p_entry) {
>  877 if (rp == tp || (rp->p_protected & tp->p_protected)) {
>  878 /*
>  879  * don't let Ethernet packets hairpin or
>  880  * move between ports in the same protected
>  881  * domain(s).
>  882  */
>  883 continue;
>  884 }
>  885 
>  886 ifp0 = tp->p_ifp0;
>  887 if (!ISSET(ifp0->if_flags, IFF_RUNNING)) {
>  888 /* don't waste time */
>  889 continue;
>  890 }
>  891 
>  892 if (!ISSET(tp->p_bif_flags, IFBIF_DISCOVER) &&
>  893 !ISSET(m0->m_flags, M_BCAST | M_MCAST)) {
>  894 /* don't flood unknown unicast */
>  895 continue;
>  896 }
>  897 
>  898 if (veb_rule_filter(tp, VEB_RULE_LIST_OUT, m0, src, dst))
>  899 continue;
>  900 
>  901 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
>  902 if (m == NULL) {
>  903 /* XXX count error? */
>  904 continue;
>  905 }
>  906 
>  907 (*tp->p_enqueue)(ifp0, m); /* XXX count error */
>  908 }
>  909 smr_read_leave();
> 
> the veb_broadcast() is being called from veb_port_input():
> 1082 
> 1083 /* unknown unicast address */
> 1084 } else {
> 1085 SET(m->m_flags, ETH64_IS_BROADCAST(dst) ? M_BCAST : 
> M_MCAST);
> 1086 }
> 1087 
> 1088 veb_broadcast(sc, p, m, src, dst);
> 1089 return (NULL);
> 
> I think I've spotted the place yesterday night in code, but could not
> figure out how to wire things up to trigger the panic. Fortunately you've
> managed to figure it out.
> 
> I think vport_if_enqueue() is a trap/land mine here. The function should 
> really
> be queueing a packet instead of dispatching it right away.
> 

I think the diff below may be enough to fix this issue. It drops the SMR
critical secition around the enqueue operation but uses a reference on the
port insteadt to ensure that the device can't be removed during the
enqueue. Once the enqueue is finished we enter the SMR critical section
again and drop the reference.

To make it clear that the SMR_TAILQ remains intact while a refcount is
held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is
not strictly needed since the next pointer remains valid up until the
smr_barrier() call but I find this a bit easier to understand.
First make sure nobody else holds a reference then remove the port from
the list.

I currently have no test setup to verify this but I hope someone else can
give this a spin.
-- 
:wq Claudio

Index: if_veb.c
===
RCS file: /cvs/src/sys/net/if_veb.c,v
retrieving revision 1.25
diff -u -p -r1.25 if_veb.c
--- if_veb.c4 Jan 2022 06:32:39 -   1.25
+++ if_veb.c12 May 2022 12:06:28 -
@@ -365,7 +365,11 @@ veb_span(struct veb_softc *sc, struct mb
continue;
}
 
+   veb_eb_brport_take(p);
+   smr_read_leave();
if_enqueue(ifp0, m); /* XXX count error */
+   smr_read_enter();
+