Re: [External] : parallel IP forwarding

2022-04-21 Thread Alexandr Nedvedicky
On Fri, Apr 08, 2022 at 12:56:12PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> I now the right time to commit the parallel forwarding diff?
> 
> Known limitiations are:
> - Hrvoje has seen a crash with both pfsync and ipsec on his production
>   machine.  But he cannot reproduce it in his lab.
> - TCP processing gets slower as we have an additional queue between
>   IP and protocol layer.
> - Protocol layer may starve as 1 exclusive lock is fightig with 4
>   shared locks.  This happens only when forwardig a lot.
> 
> The advantage of commiting is that we see how relevant these things
> are in real world.  But the most important thing is that we learn
> how all the locks behave under MP pressure.  You can add a lot of
> locking, but only when you run in parallel, you see if it is correct.
> 
> An alternative could be to commit it with NET_TASKQ 1.  With only
> one softnet thread I would expect to see less bugs, but there is
> also less to learn.  NET_TASKQ 1 could be a safe point where we
> could easily switch back.
> 
> bluhm
> 

I think it's worth to give it a try. May be I would just split
it to three commits/diffs:

diff which adds task queues for forwarding

diff which adds a KERNEL_LOCK to deal with arp

diff which deals with local delivery to socket

it's up to you. all changes look good to me. You have my OK,
in case you want to proceed in one big step.

OK sashan



Re: parallel IP forwarding

2022-04-18 Thread Alexander Bluhm
On Mon, Apr 18, 2022 at 12:27:23PM +0200, Hrvoje Popovski wrote:
> On 8.4.2022. 12:56, Alexander Bluhm wrote:
> > I now the right time to commit the parallel forwarding diff?
> > 
> > Known limitiations are:
> > - Hrvoje has seen a crash with both pfsync and ipsec on his production
> >   machine.  But he cannot reproduce it in his lab.
> 
> This is resolved. At least this panic doesn't happen any more.

Good to hear.  I guess the crash seen before is related to an
uncommited diff that was tested in Hrvoje's setup.

So we can run IP forwarding in parallel.  I would like to commit
this diff now.  Then we can learn whether there are other limitations
and fix them in tree.

ok?

bluhm

Index: net/if.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
retrieving revision 1.649
diff -u -p -r1.649 if.c
--- net/if.c25 Feb 2022 23:51:03 -  1.649
+++ net/if.c18 Apr 2022 17:12:59 -
@@ -237,7 +237,7 @@ int ifq_congestion;
 
 int netisr;
 
-#defineNET_TASKQ   1
+#defineNET_TASKQ   4
 struct taskq   *nettqmp[NET_TASKQ];
 
 struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
@@ -834,15 +834,10 @@ if_input_process(struct ifnet *ifp, stru
 * lists and the socket layer.
 */
 
-   /*
-* XXXSMP IPsec data structures are not ready to be accessed
-* by multiple network threads in parallel.  In this case
-* use an exclusive lock.
-*/
-   NET_LOCK();
+   NET_RLOCK_IN_SOFTNET();
while ((m = ml_dequeue(ml)) != NULL)
(*ifp->if_input)(ifp, m);
-   NET_UNLOCK();
+   NET_RUNLOCK_IN_SOFTNET();
 }
 
 void
@@ -899,6 +894,12 @@ if_netisr(void *unused)
arpintr();
KERNEL_UNLOCK();
}
+#endif
+   if (n & (1 << NETISR_IP))
+   ipintr();
+#ifdef INET6
+   if (n & (1 << NETISR_IPV6))
+   ip6intr();
 #endif
 #if NPPP > 0
if (n & (1 << NETISR_PPP)) {
Index: net/if_ethersubr.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.278
diff -u -p -r1.278 if_ethersubr.c
--- net/if_ethersubr.c  22 Feb 2022 01:15:02 -  1.278
+++ net/if_ethersubr.c  18 Apr 2022 17:12:59 -
@@ -221,7 +221,10 @@ ether_resolve(struct ifnet *ifp, struct 
 
switch (af) {
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IP);
@@ -244,7 +247,10 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IPV6);
@@ -270,13 +276,19 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
break;
 #endif
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
break;
@@ -528,12 +540,14 @@ ether_input(struct ifnet *ifp, struct mb
case ETHERTYPE_PPPOE:
if (m->m_flags & (M_MCAST | M_BCAST))
goto dropanyway;
+   KERNEL_LOCK();
 #ifdef PIPEX
if (pipex_enable) {
struct pipex_session *session;
 
if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
pipex_pppoe_input(m, session);
+   KERNEL_UNLOCK();
return;
}
}
@@ -542,6 +556,7 @@ ether_input(struct ifnet *ifp, struct mb
pppoe_disc_input(m);
else
pppoe_data_input(m);
+   KERNEL_UNLOCK();
 

Re: parallel IP forwarding

2022-04-18 Thread Hrvoje Popovski
On 8.4.2022. 12:56, Alexander Bluhm wrote:
> Hi,
> 
> I now the right time to commit the parallel forwarding diff?
> 
> Known limitiations are:
> - Hrvoje has seen a crash with both pfsync and ipsec on his production
>   machine.  But he cannot reproduce it in his lab.

This is resolved. At least this panic doesn't happen any more.


> - TCP processing gets slower as we have an additional queue between
>   IP and protocol layer.
> - Protocol layer may starve as 1 exclusive lock is fightig with 4
>   shared locks.  This happens only when forwardig a lot.



parallel IP forwarding

2022-04-08 Thread Alexander Bluhm
Hi,

I now the right time to commit the parallel forwarding diff?

Known limitiations are:
- Hrvoje has seen a crash with both pfsync and ipsec on his production
  machine.  But he cannot reproduce it in his lab.
- TCP processing gets slower as we have an additional queue between
  IP and protocol layer.
- Protocol layer may starve as 1 exclusive lock is fightig with 4
  shared locks.  This happens only when forwardig a lot.

The advantage of commiting is that we see how relevant these things
are in real world.  But the most important thing is that we learn
how all the locks behave under MP pressure.  You can add a lot of
locking, but only when you run in parallel, you see if it is correct.

An alternative could be to commit it with NET_TASKQ 1.  With only
one softnet thread I would expect to see less bugs, but there is
also less to learn.  NET_TASKQ 1 could be a safe point where we
could easily switch back.

bluhm

Index: net/if.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
retrieving revision 1.649
diff -u -p -r1.649 if.c
--- net/if.c25 Feb 2022 23:51:03 -  1.649
+++ net/if.c29 Mar 2022 12:44:05 -
@@ -237,7 +237,7 @@ int ifq_congestion;
 
 int netisr;
 
-#defineNET_TASKQ   1
+#defineNET_TASKQ   4
 struct taskq   *nettqmp[NET_TASKQ];
 
 struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
@@ -834,15 +834,10 @@ if_input_process(struct ifnet *ifp, stru
 * lists and the socket layer.
 */
 
-   /*
-* XXXSMP IPsec data structures are not ready to be accessed
-* by multiple network threads in parallel.  In this case
-* use an exclusive lock.
-*/
-   NET_LOCK();
+   NET_RLOCK_IN_SOFTNET();
while ((m = ml_dequeue(ml)) != NULL)
(*ifp->if_input)(ifp, m);
-   NET_UNLOCK();
+   NET_RUNLOCK_IN_SOFTNET();
 }
 
 void
@@ -899,6 +894,12 @@ if_netisr(void *unused)
arpintr();
KERNEL_UNLOCK();
}
+#endif
+   if (n & (1 << NETISR_IP))
+   ipintr();
+#ifdef INET6
+   if (n & (1 << NETISR_IPV6))
+   ip6intr();
 #endif
 #if NPPP > 0
if (n & (1 << NETISR_PPP)) {
Index: net/if_ethersubr.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.278
diff -u -p -r1.278 if_ethersubr.c
--- net/if_ethersubr.c  22 Feb 2022 01:15:02 -  1.278
+++ net/if_ethersubr.c  29 Mar 2022 12:44:05 -
@@ -221,7 +221,10 @@ ether_resolve(struct ifnet *ifp, struct 
 
switch (af) {
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IP);
@@ -244,7 +247,10 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IPV6);
@@ -270,13 +276,19 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
break;
 #endif
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
break;
@@ -528,12 +540,14 @@ ether_input(struct ifnet *ifp, struct mb
case ETHERTYPE_PPPOE:
if (m->m_flags & (M_MCAST | M_BCAST))
goto dropanyway;
+   KERNEL_LOCK();
 #ifdef PIPEX
if (pipex_enable) {
struct pipex_session *session;
 
if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
pipex_pppoe_input(m, session);
+   KERNEL_UNLOCK();
return;
   

Re: parallel ip forwarding

2021-12-30 Thread Vitaliy Makkoveev



> On 30 Dec 2021, at 11:28, YASUOKA Masahiko  wrote:
> 
> Hi,
> 
> On Sat, 25 Dec 2021 21:50:47 +0300
> Vitaliy Makkoveev  wrote:
>> On Fri, Dec 24, 2021 at 12:50:23PM +0100, Alexander Bluhm wrote:
>>> On Fri, Dec 24, 2021 at 04:16:28PM +0900, YASUOKA Masahiko wrote:
> - npppd l2pt ipsecflowinfo is not MP safe
 
 Does this mean the things we are discussing on the "Fix
 ipsp_spd_lookup() for transport mode" thread?  I wonder if there is
 another issue.
>>> 
>>> In this mail thread I was concerned about things might get worse.
>>> 
>>> Currently I see these problems:
>>> 
>>> tdb_free() will be called with a shared netlock.  From there
>>> ipsp_ids_free() is called.
>>> 
>>>if (--ids->id_refcount > 0)
>>>return;
>>> 
>>> This ref count needs to be atomic.
>>> 
>>>if (LIST_EMPTY(_ids_gc_list))
>>>timeout_add_sec(_ids_gc_timeout, 1);
>>>LIST_INSERT_HEAD(_ids_gc_list, ids, id_gc_list);
>>> 
>>> And some mutex should protect ipsp_ids_gc_list.
> 
> Thanks, I suppose I could catch up the problem.
> 
>> The diff below adds `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*'
>> list and trees. ipsp_ids_lookup() returns `ids' with bumped reference
>> counter.
> 
> This direction seems good.
> 
> One thing, I found a problem.
> 
>> Index: sys/netinet/ip_spd.c
>> ===
>> RCS file: /cvs/src/sys/netinet/ip_spd.c,v
>> retrieving revision 1.110
>> diff -u -p -r1.110 ip_spd.c
>> --- sys/netinet/ip_spd.c 16 Dec 2021 15:38:03 -  1.110
>> +++ sys/netinet/ip_spd.c 25 Dec 2021 18:34:22 -
>> @@ -418,6 +418,7 @@ ipsp_spd_lookup(struct mbuf *m, int af, 
>>  /* Cached entry is good. */
>>  error = ipsp_spd_inp(m, inp, ipo, tdbout);
>>  mtx_leave(_tdb_mtx);
>> +ipsp_ids_free(ids);
>>  return error;
>> 
>>   nomatchout:
>> @@ -452,6 +453,7 @@ ipsp_spd_lookup(struct mbuf *m, int af, 
>>  dignore ?  : >ipo_dst,
>>  ipo->ipo_sproto, ids ? ids: ipo->ipo_ids,
>>  >ipo_addr, >ipo_mask);
>> +ipsp_ids_free(ids);
>>  mtx_enter(_tdb_mtx);
>>  if ((tdbp_new != NULL) &&
>>  (tdbp_new->tdb_flags & TDBF_DELETED)) {
> 
> ids will remain unfreed since there are some code paths which doesn't
> pass the above lines.
> 

You are right, I missed that.

> I tried to fix that, but adding a lot of ipsp_ids_free() looks a mess.
> Instead, how about changing ipsp_spd_lookup() to take a "struct
> ipsec_ids *ids" as an argument  and letting the caller take the
> resposibility of the ids?
> 

Seems reasonable. ok mvs@


> Index: sys/net/if_bridge.c
> ===
> RCS file: /disk/cvs/openbsd/src/sys/net/if_bridge.c,v
> retrieving revision 1.362
> diff -u -p -r1.362 if_bridge.c
> --- sys/net/if_bridge.c   23 Dec 2021 12:21:48 -  1.362
> +++ sys/net/if_bridge.c   30 Dec 2021 08:12:18 -
> @@ -1595,7 +1595,7 @@ bridge_ipsec(struct ifnet *ifp, struct e
>   }
>   } else { /* Outgoing from the bridge. */
>   error = ipsp_spd_lookup(m, af, hlen, IPSP_DIRECTION_OUT,
> - NULL, NULL, , 0);
> + NULL, NULL, , NULL);
>   if (error == 0 && tdb != NULL) {
>   /*
>* We don't need to do loop detection, the
> Index: sys/net/if_veb.c
> ===
> RCS file: /disk/cvs/openbsd/src/sys/net/if_veb.c,v
> retrieving revision 1.21
> diff -u -p -r1.21 if_veb.c
> --- sys/net/if_veb.c  8 Nov 2021 04:15:46 -   1.21
> +++ sys/net/if_veb.c  30 Dec 2021 08:12:18 -
> @@ -746,7 +746,7 @@ veb_ipsec_proto_out(struct mbuf *m, sa_f
> #endif
> 
>   tdb = ipsp_spd_lookup(m, af, iphlen, , IPSP_DIRECTION_OUT,
> - NULL, NULL, 0);
> + NULL, NULL, NULL);
>   if (tdb == NULL)
>   return (m);
> 
> Index: sys/netinet/ip_ipsp.c
> ===
> RCS file: /disk/cvs/openbsd/src/sys/netinet/ip_ipsp.c,v
> retrieving revision 1.267
> diff -u -p -r1.267 ip_ipsp.c
> --- sys/netinet/ip_ipsp.c 20 Dec 2021 15:59:09 -  1.267
> +++ sys/netinet/ip_ipsp.c 30 Dec 2021 08:12:18 -
> @@ -47,6 +47,8 @@
> #include 
> #include 
> #include 
> +#include 
> +#include 
> 
> #include 
> #include 
> @@ -84,6 +86,13 @@ void tdb_hashstats(void);
>   do { } while (0)
> #endif
> 
> +/*
> + * Locks used to protect global data and struct members:
> + *   F   ipsec_flows_mtx
> + */
> +
> +struct mutex ipsec_flows_mtx = MUTEX_INITIALIZER(IPL_SOFTNET);
> +
> int   tdb_rehash(void);
> void  tdb_timeout(void *);
> void  tdb_firstuse(void *);

Re: parallel ip forwarding

2021-12-30 Thread YASUOKA Masahiko
Hi,

On Sat, 25 Dec 2021 21:50:47 +0300
Vitaliy Makkoveev  wrote:
> On Fri, Dec 24, 2021 at 12:50:23PM +0100, Alexander Bluhm wrote:
>> On Fri, Dec 24, 2021 at 04:16:28PM +0900, YASUOKA Masahiko wrote:
>> > > - npppd l2pt ipsecflowinfo is not MP safe
>> > 
>> > Does this mean the things we are discussing on the "Fix
>> > ipsp_spd_lookup() for transport mode" thread?  I wonder if there is
>> > another issue.
>> 
>> In this mail thread I was concerned about things might get worse.
>> 
>> Currently I see these problems:
>> 
>> tdb_free() will be called with a shared netlock.  From there
>> ipsp_ids_free() is called.
>> 
>> if (--ids->id_refcount > 0)
>> return;
>> 
>> This ref count needs to be atomic.
>> 
>> if (LIST_EMPTY(_ids_gc_list))
>> timeout_add_sec(_ids_gc_timeout, 1);
>> LIST_INSERT_HEAD(_ids_gc_list, ids, id_gc_list);
>> 
>> And some mutex should protect ipsp_ids_gc_list.

Thanks, I suppose I could catch up the problem.

> The diff below adds `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*'
> list and trees. ipsp_ids_lookup() returns `ids' with bumped reference
> counter.

This direction seems good.

One thing, I found a problem.

> Index: sys/netinet/ip_spd.c
> ===
> RCS file: /cvs/src/sys/netinet/ip_spd.c,v
> retrieving revision 1.110
> diff -u -p -r1.110 ip_spd.c
> --- sys/netinet/ip_spd.c  16 Dec 2021 15:38:03 -  1.110
> +++ sys/netinet/ip_spd.c  25 Dec 2021 18:34:22 -
> @@ -418,6 +418,7 @@ ipsp_spd_lookup(struct mbuf *m, int af, 
>   /* Cached entry is good. */
>   error = ipsp_spd_inp(m, inp, ipo, tdbout);
>   mtx_leave(_tdb_mtx);
> + ipsp_ids_free(ids);
>   return error;
>  
>nomatchout:
> @@ -452,6 +453,7 @@ ipsp_spd_lookup(struct mbuf *m, int af, 
>   dignore ?  : >ipo_dst,
>   ipo->ipo_sproto, ids ? ids: ipo->ipo_ids,
>   >ipo_addr, >ipo_mask);
> + ipsp_ids_free(ids);
>   mtx_enter(_tdb_mtx);
>   if ((tdbp_new != NULL) &&
>   (tdbp_new->tdb_flags & TDBF_DELETED)) {

ids will remain unfreed since there are some code paths which doesn't
pass the above lines.

I tried to fix that, but adding a lot of ipsp_ids_free() looks a mess.
Instead, how about changing ipsp_spd_lookup() to take a "struct
ipsec_ids *ids" as an argument  and letting the caller take the
resposibility of the ids?

Index: sys/net/if_bridge.c
===
RCS file: /disk/cvs/openbsd/src/sys/net/if_bridge.c,v
retrieving revision 1.362
diff -u -p -r1.362 if_bridge.c
--- sys/net/if_bridge.c 23 Dec 2021 12:21:48 -  1.362
+++ sys/net/if_bridge.c 30 Dec 2021 08:12:18 -
@@ -1595,7 +1595,7 @@ bridge_ipsec(struct ifnet *ifp, struct e
}
} else { /* Outgoing from the bridge. */
error = ipsp_spd_lookup(m, af, hlen, IPSP_DIRECTION_OUT,
-   NULL, NULL, , 0);
+   NULL, NULL, , NULL);
if (error == 0 && tdb != NULL) {
/*
 * We don't need to do loop detection, the
Index: sys/net/if_veb.c
===
RCS file: /disk/cvs/openbsd/src/sys/net/if_veb.c,v
retrieving revision 1.21
diff -u -p -r1.21 if_veb.c
--- sys/net/if_veb.c8 Nov 2021 04:15:46 -   1.21
+++ sys/net/if_veb.c30 Dec 2021 08:12:18 -
@@ -746,7 +746,7 @@ veb_ipsec_proto_out(struct mbuf *m, sa_f
 #endif
 
tdb = ipsp_spd_lookup(m, af, iphlen, , IPSP_DIRECTION_OUT,
-   NULL, NULL, 0);
+   NULL, NULL, NULL);
if (tdb == NULL)
return (m);
 
Index: sys/netinet/ip_ipsp.c
===
RCS file: /disk/cvs/openbsd/src/sys/netinet/ip_ipsp.c,v
retrieving revision 1.267
diff -u -p -r1.267 ip_ipsp.c
--- sys/netinet/ip_ipsp.c   20 Dec 2021 15:59:09 -  1.267
+++ sys/netinet/ip_ipsp.c   30 Dec 2021 08:12:18 -
@@ -47,6 +47,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -84,6 +86,13 @@ void tdb_hashstats(void);
do { } while (0)
 #endif
 
+/*
+ * Locks used to protect global data and struct members:
+ * F   ipsec_flows_mtx
+ */
+
+struct mutex ipsec_flows_mtx = MUTEX_INITIALIZER(IPL_SOFTNET);
+
 inttdb_rehash(void);
 void   tdb_timeout(void *);
 void   tdb_firstuse(void *);
@@ -98,16 +107,16 @@ int ipsec_ids_idle = 100;  /* keep free 
 struct pool tdb_pool;
 
 /* Protected by the NET_LOCK(). */
-u_int32_t ipsec_ids_next_flow = 1; /* may not be zero */
-struct ipsec_ids_tree ipsec_ids_tree;
-struct ipsec_ids_flows 

Re: parallel ip forwarding

2021-12-27 Thread Hrvoje Popovski
On 25.12.2021. 18:52, Alexander Bluhm wrote:
> On Sat, Dec 25, 2021 at 09:24:07AM +0100, Hrvoje Popovski wrote:
>> On 24.12.2021. 0:55, Alexander Bluhm wrote:
>>> I think we can remove the ipsec_in_use workaround now.  The IPsec
>>> path is protected with the kernel lock.
>>>
>>> There are some issues left:
>>> - npppd l2pt ipsecflowinfo is not MP safe
>>> - the acquire SA feature is not MP safe
>>> - Hrvoje has seen a panic with sasync
>>>
>>> If you use one of these, the diff below should trigger crashes faster.
>>> If you use only regular IPsec or forwarding, I hope it is stable.
>> Hi,
>>
>> after hitting sasyncd setup with this diff for some time i've run
>> ipsecctl -sa just to see what's going on and box panic
> According to ddb output ipsecctl is running in userland and the
> panic is triggered by a kernel timeout.  Could it be coincidence?
> When I run with 4 softnet threads, I see userland starvation.  Maybe
> both timeout and ipsecctl wait for the exclusive netlock and are
> executed shortly after each other.
> 
>> r620-1# ipsecctl -sa
>> uvm_fault(0x82200c18, 0x417, 0, 2) -> e
>> kernel: page fault trap, code=0
>> Stopped at  pfsync_delete_tdb+0x84: movq%rcx,0x8(%rsi)
>> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>>  290490  40316  0 0x3  03  ipsecctl
>>   10869  22801 680x10  05  sasyncd
>>  504041  13202 680x10   0x801  isakmpd
>>  476980   6400  00x10  02  ntpd
>>  224100  72648  0 0x14000  0x2004  reaper
>> * 75659  10211  0 0x14000 0x42000K softclock
>> pfsync_delete_tdb(812e8008) at pfsync_delete_tdb+0x84
>> tdb_free(812e8008) at tdb_free+0x67
>> tdb_timeout(812e8008) at tdb_timeout+0x7e
>> softclock_thread(8000f260) at softclock_thread+0x13b
>> end trace frame: 0x0, count: 11
>> https://www.openbsd.org/ddb.html describes the minimum info required in
>> bug reports.  Insufficient info makes it difficult to find and fix bugs.
>> ddb{0}>
> /home/bluhm/openbsd/cvs/src/sys/net/if_pfsync.c:2519
> 5f30:   49 8b 86 10 04 00 00mov0x410(%r14),%rax
> 5f37:   49 8b 8e 18 04 00 00mov0x418(%r14),%rcx
> 5f3e:   49 8d 94 24 e8 09 00lea0x9e8(%r12),%rdx
> 5f45:   00
> 5f46:   48 8d b0 10 04 00 00lea0x410(%rax),%rsi
> 5f4d:   48 85 c0test   %rax,%rax
> *   5f50:   48 0f 44 f2 cmove  %rdx,%rsi
> 5f54:   48 89 4e 08 mov%rcx,0x8(%rsi)
> 5f58:   49 8b 86 10 04 00 00mov0x410(%r14),%rax
> 5f5f:   49 8b 8e 18 04 00 00mov0x418(%r14),%rcx
> 5f66:   48 89 01mov%rax,(%rcx)
> 5f69:   49 c7 86 18 04 00 00movq   $0x,0x418(%r14)
> 5f70:   ff ff ff ff
> 5f74:   49 c7 86 10 04 00 00movq   $0x,0x410(%r14)
> 5f7b:   ff ff ff ff
> /home/bluhm/openbsd/cvs/src/sys/net/if_pfsync.c:2520
> 
>   2508  void
>   2509  pfsync_delete_tdb(struct tdb *t)
>   2510  {
>   2511  struct pfsync_softc *sc = pfsyncif;
>   2512  size_t nlen;
>   2513
>   2514  if (sc == NULL || !ISSET(t->tdb_flags, TDBF_PFSYNC))
>   2515  return;
>   2516
>   2517  mtx_enter(>sc_tdb_mtx);
>   2518
> * 2519  TAILQ_REMOVE(>sc_tdb_q, t, tdb_sync_entry);
>   2520  CLR(t->tdb_flags, TDBF_PFSYNC);
>   2521
>   2522  nlen = sizeof(struct pfsync_tdb);
>   2523  if (TAILQ_EMPTY(>sc_tdb_q))
>   2524  nlen += sizeof(struct pfsync_subheader);
>   2525  atomic_sub_long(>sc_len, nlen);
>   2526
>   2527  mtx_leave(>sc_tdb_mtx);
>   2528  }
> 
> Most sc_tdb_q are protected by sc_tdb_mtx.  But pfsync_drop_snapshot()
> and pfsync_sendout() do not have it.
> 
>> rsi0x40f
>> rcx   0x
>> rax   0x
> tqe_next and tqe_prev are -1.  Looks like a double remove.  Can
> pfsync_delete_tdb() be called twice?  The tdb_refcnt should enforce
> that tdb_free() is only called once per TDB.
> 
>>  flags: d1040
> Flags look reasonable.  But a problem could be that they are not
> protected in if_pfsync.  Then set or cleared bits may be lost.
> 
> Fix both missing mutexes .  Also put the tdb_sync_entry
> modification and TDBF_PFSYNC flag in the same sc_tdb_mtx.
> 
> I still have no pf sync setup.  Hrvoje, can you try this?

Hi,

i think that this panic is coincidence, but i can't prove it because
it's not so easy to reproduce. I'm hitting sasync setup with this diff,
for 2 days and i can panic boxes...
If i try to reproduce panic without this diff, would it give you some
more information or hints regarding ipsec and mp forwarding?




Re: parallel ip forwarding

2021-12-25 Thread Vitaliy Makkoveev
On Fri, Dec 24, 2021 at 12:50:23PM +0100, Alexander Bluhm wrote:
> On Fri, Dec 24, 2021 at 04:16:28PM +0900, YASUOKA Masahiko wrote:
> > > - npppd l2pt ipsecflowinfo is not MP safe
> > 
> > Does this mean the things we are discussing on the "Fix
> > ipsp_spd_lookup() for transport mode" thread?  I wonder if there is
> > another issue.
> 
> In this mail thread I was concerned about things might get worse.
> 
> Currently I see these problems:
> 
> tdb_free() will be called with a shared netlock.  From there
> ipsp_ids_free() is called.
> 
> if (--ids->id_refcount > 0)
> return;
> 
> This ref count needs to be atomic.
> 
> if (LIST_EMPTY(_ids_gc_list))
> timeout_add_sec(_ids_gc_timeout, 1);
> LIST_INSERT_HEAD(_ids_gc_list, ids, id_gc_list);
> 
> And some mutex should protect ipsp_ids_gc_list.
> 
> bluhm
> 

The diff below adds `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*'
list and trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter.

Index: sys/netinet/ip_ipsp.c
===
RCS file: /cvs/src/sys/netinet/ip_ipsp.c,v
retrieving revision 1.267
diff -u -p -r1.267 ip_ipsp.c
--- sys/netinet/ip_ipsp.c   20 Dec 2021 15:59:09 -  1.267
+++ sys/netinet/ip_ipsp.c   25 Dec 2021 18:34:22 -
@@ -47,6 +47,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -84,6 +86,13 @@ void tdb_hashstats(void);
do { } while (0)
 #endif
 
+/*
+ * Locks used to protect global data and struct members:
+ * F   ipsec_flows_mtx
+ */
+
+struct mutex ipsec_flows_mtx = MUTEX_INITIALIZER(IPL_SOFTNET);
+
 inttdb_rehash(void);
 void   tdb_timeout(void *);
 void   tdb_firstuse(void *);
@@ -98,16 +107,16 @@ int ipsec_ids_idle = 100;  /* keep free 
 struct pool tdb_pool;
 
 /* Protected by the NET_LOCK(). */
-u_int32_t ipsec_ids_next_flow = 1; /* may not be zero */
-struct ipsec_ids_tree ipsec_ids_tree;
-struct ipsec_ids_flows ipsec_ids_flows;
+u_int32_t ipsec_ids_next_flow = 1; /* [F] may not be zero */
+struct ipsec_ids_tree ipsec_ids_tree;  /* [F] */
+struct ipsec_ids_flows ipsec_ids_flows;/* [F] */
 struct ipsec_policy_head ipsec_policy_head =
 TAILQ_HEAD_INITIALIZER(ipsec_policy_head);
 
 void ipsp_ids_gc(void *);
 
 LIST_HEAD(, ipsec_ids) ipsp_ids_gc_list =
-LIST_HEAD_INITIALIZER(ipsp_ids_gc_list);
+LIST_HEAD_INITIALIZER(ipsp_ids_gc_list);   /* [F] */
 struct timeout ipsp_ids_gc_timeout =
 TIMEOUT_INITIALIZER_FLAGS(ipsp_ids_gc, NULL, TIMEOUT_PROC);
 
@@ -1191,21 +1200,25 @@ ipsp_ids_insert(struct ipsec_ids *ids)
struct ipsec_ids *found;
u_int32_t start_flow;
 
-   NET_ASSERT_LOCKED();
+   mtx_enter(_flows_mtx);
 
found = RBT_INSERT(ipsec_ids_tree, _ids_tree, ids);
if (found) {
/* if refcount was zero, then timeout is running */
-   if (found->id_refcount++ == 0) {
+   if (atomic_inc_int_nv(>id_refcount) == 1) {
LIST_REMOVE(found, id_gc_list);
 
if (LIST_EMPTY(_ids_gc_list))
timeout_del(_ids_gc_timeout);
}
+   mtx_leave (_flows_mtx);
DPRINTF("ids %p count %d", found, found->id_refcount);
return found;
}
+
+   ids->id_refcount = 1;
ids->id_flow = start_flow = ipsec_ids_next_flow;
+
if (++ipsec_ids_next_flow == 0)
ipsec_ids_next_flow = 1;
while (RBT_INSERT(ipsec_ids_flows, _ids_flows, ids) != NULL) {
@@ -1214,12 +1227,13 @@ ipsp_ids_insert(struct ipsec_ids *ids)
ipsec_ids_next_flow = 1;
if (ipsec_ids_next_flow == start_flow) {
RBT_REMOVE(ipsec_ids_tree, _ids_tree, ids);
+   mtx_leave(_flows_mtx);
DPRINTF("ipsec_ids_next_flow exhausted %u",
-   ipsec_ids_next_flow);
+   start_flow);
return NULL;
}
}
-   ids->id_refcount = 1;
+   mtx_leave(_flows_mtx);
DPRINTF("new ids %p flow %u", ids, ids->id_flow);
return ids;
 }
@@ -1228,11 +1242,16 @@ struct ipsec_ids *
 ipsp_ids_lookup(u_int32_t ipsecflowinfo)
 {
struct ipsec_idskey;
-
-   NET_ASSERT_LOCKED();
+   struct ipsec_ids*ids;
 
key.id_flow = ipsecflowinfo;
-   return RBT_FIND(ipsec_ids_flows, _ids_flows, );
+
+   mtx_enter(_flows_mtx);
+   ids = RBT_FIND(ipsec_ids_flows, _ids_flows, );
+   atomic_inc_int(>id_refcount);
+   mtx_leave(_flows_mtx);
+
+   return ids;
 }
 
 /* free ids only from delayed timeout */
@@ -1241,7 +1260,7 @@ ipsp_ids_gc(void *arg)
 {
struct ipsec_ids *ids, *tids;
 
-   NET_LOCK();
+   mtx_enter(_flows_mtx);
 

Re: parallel ip forwarding

2021-12-25 Thread Alexander Bluhm
On Sat, Dec 25, 2021 at 09:24:07AM +0100, Hrvoje Popovski wrote:
> On 24.12.2021. 0:55, Alexander Bluhm wrote:
> > I think we can remove the ipsec_in_use workaround now.  The IPsec
> > path is protected with the kernel lock.
> > 
> > There are some issues left:
> > - npppd l2pt ipsecflowinfo is not MP safe
> > - the acquire SA feature is not MP safe
> > - Hrvoje has seen a panic with sasync
> > 
> > If you use one of these, the diff below should trigger crashes faster.
> > If you use only regular IPsec or forwarding, I hope it is stable.
> 
> Hi,
> 
> after hitting sasyncd setup with this diff for some time i've run
> ipsecctl -sa just to see what's going on and box panic

According to ddb output ipsecctl is running in userland and the
panic is triggered by a kernel timeout.  Could it be coincidence?
When I run with 4 softnet threads, I see userland starvation.  Maybe
both timeout and ipsecctl wait for the exclusive netlock and are
executed shortly after each other.

> r620-1# ipsecctl -sa
> uvm_fault(0x82200c18, 0x417, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at  pfsync_delete_tdb+0x84: movq%rcx,0x8(%rsi)
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>  290490  40316  0 0x3  03  ipsecctl
>   10869  22801 680x10  05  sasyncd
>  504041  13202 680x10   0x801  isakmpd
>  476980   6400  00x10  02  ntpd
>  224100  72648  0 0x14000  0x2004  reaper
> * 75659  10211  0 0x14000 0x42000K softclock
> pfsync_delete_tdb(812e8008) at pfsync_delete_tdb+0x84
> tdb_free(812e8008) at tdb_free+0x67
> tdb_timeout(812e8008) at tdb_timeout+0x7e
> softclock_thread(8000f260) at softclock_thread+0x13b
> end trace frame: 0x0, count: 11
> https://www.openbsd.org/ddb.html describes the minimum info required in
> bug reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{0}>

/home/bluhm/openbsd/cvs/src/sys/net/if_pfsync.c:2519
5f30:   49 8b 86 10 04 00 00mov0x410(%r14),%rax
5f37:   49 8b 8e 18 04 00 00mov0x418(%r14),%rcx
5f3e:   49 8d 94 24 e8 09 00lea0x9e8(%r12),%rdx
5f45:   00 
5f46:   48 8d b0 10 04 00 00lea0x410(%rax),%rsi
5f4d:   48 85 c0test   %rax,%rax
*   5f50:   48 0f 44 f2 cmove  %rdx,%rsi
5f54:   48 89 4e 08 mov%rcx,0x8(%rsi)
5f58:   49 8b 86 10 04 00 00mov0x410(%r14),%rax
5f5f:   49 8b 8e 18 04 00 00mov0x418(%r14),%rcx
5f66:   48 89 01mov%rax,(%rcx)
5f69:   49 c7 86 18 04 00 00movq   $0x,0x418(%r14)
5f70:   ff ff ff ff 
5f74:   49 c7 86 10 04 00 00movq   $0x,0x410(%r14)
5f7b:   ff ff ff ff 
/home/bluhm/openbsd/cvs/src/sys/net/if_pfsync.c:2520

  2508  void
  2509  pfsync_delete_tdb(struct tdb *t)
  2510  {
  2511  struct pfsync_softc *sc = pfsyncif;
  2512  size_t nlen;
  2513  
  2514  if (sc == NULL || !ISSET(t->tdb_flags, TDBF_PFSYNC))
  2515  return;
  2516  
  2517  mtx_enter(>sc_tdb_mtx);
  2518  
* 2519  TAILQ_REMOVE(>sc_tdb_q, t, tdb_sync_entry);
  2520  CLR(t->tdb_flags, TDBF_PFSYNC);
  2521  
  2522  nlen = sizeof(struct pfsync_tdb);
  2523  if (TAILQ_EMPTY(>sc_tdb_q))
  2524  nlen += sizeof(struct pfsync_subheader);
  2525  atomic_sub_long(>sc_len, nlen);
  2526  
  2527  mtx_leave(>sc_tdb_mtx);
  2528  }

Most sc_tdb_q are protected by sc_tdb_mtx.  But pfsync_drop_snapshot()
and pfsync_sendout() do not have it.

> rsi0x40f
> rcx   0x
> rax   0x

tqe_next and tqe_prev are -1.  Looks like a double remove.  Can
pfsync_delete_tdb() be called twice?  The tdb_refcnt should enforce
that tdb_free() is only called once per TDB.

>  flags: d1040

Flags look reasonable.  But a problem could be that they are not
protected in if_pfsync.  Then set or cleared bits may be lost.

Fix both missing mutexes .  Also put the tdb_sync_entry
modification and TDBF_PFSYNC flag in the same sc_tdb_mtx.

I still have no pf sync setup.  Hrvoje, can you try this?

ok?

bluhm

Index: net/if_pfsync.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_pfsync.c,v
retrieving revision 1.299
diff -u -p -r1.299 if_pfsync.c
--- net/if_pfsync.c 25 Nov 2021 13:46:02 -  1.299
+++ net/if_pfsync.c 25 Dec 2021 17:46:06 -
@@ -295,7 +295,7 @@ voidpfsync_bulk_update(void *);
 void   pfsync_bulk_fail(void *);
 
 void   pfsync_grab_snapshot(struct pfsync_snapshot *, struct pfsync_softc *);
-void   pfsync_drop_snapshot(struct pfsync_snapshot *);
+void   pfsync_drop_snapshot(struct 

Re: parallel ip forwarding

2021-12-25 Thread Hrvoje Popovski
On 24.12.2021. 0:55, Alexander Bluhm wrote:
> I think we can remove the ipsec_in_use workaround now.  The IPsec
> path is protected with the kernel lock.
> 
> There are some issues left:
> - npppd l2pt ipsecflowinfo is not MP safe
> - the acquire SA feature is not MP safe
> - Hrvoje has seen a panic with sasync
> 
> If you use one of these, the diff below should trigger crashes faster.
> If you use only regular IPsec or forwarding, I hope it is stable.

Hi,

after hitting sasyncd setup with this diff for some time i've run
ipsecctl -sa just to see what's going on and box panic


r620-1# ipsecctl -sa
uvm_fault(0x82200c18, 0x417, 0, 2) -> e
kernel: page fault trap, code=0
Stopped at  pfsync_delete_tdb+0x84: movq%rcx,0x8(%rsi)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 290490  40316  0 0x3  03  ipsecctl
  10869  22801 680x10  05  sasyncd
 504041  13202 680x10   0x801  isakmpd
 476980   6400  00x10  02  ntpd
 224100  72648  0 0x14000  0x2004  reaper
* 75659  10211  0 0x14000 0x42000K softclock
pfsync_delete_tdb(812e8008) at pfsync_delete_tdb+0x84
tdb_free(812e8008) at tdb_free+0x67
tdb_timeout(812e8008) at tdb_timeout+0x7e
softclock_thread(8000f260) at softclock_thread+0x13b
end trace frame: 0x0, count: 11
https://www.openbsd.org/ddb.html describes the minimum info required in
bug reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{0}>



ddb{0}> show reg
rdi  0x4
rsi0x40f
rbp   0x800022c4e390
rbx0
rdx   0x806b39e8
rcx   0x
rax   0x
r8  0x1f
r9 0
r10   0xbaf844ce8eec335d
r11   0xb9858c0c287d2c4d
r12   0x806b3000
r13   0x821c5e60timeout_proc
r14   0x812e8008
r15   0x806b39f8
rip   0x8131a254pfsync_delete_tdb+0x84
cs   0x8
rflags   0x10286__ALIGN_SIZE+0xf286
rsp   0x800022c4e360
ss  0x10
pfsync_delete_tdb+0x84: movq%rcx,0x8(%rsi)
ddb{0}>


ddb{0}> show all tdb
0x812e8008: f9f247f0 192.168.42.100->192.168.42.112:50 #0 000d1040
0x812e8428: 959c114b 192.168.42.112->192.168.42.100:50 #2 1002
0x812e8848: b7eb65bc 192.168.42.113->192.168.42.100:50 #2 1002
0x812e9ce8: 55495192 192.168.42.100->192.168.42.113:50 #3 000d1002
ddb{0}>


ddb{0}> show tdb /f 0x812e8008
tdb at 0x812e8008
 hnext: 0x0
 dnext: 0x0
 snext: 0x0
 inext: 0x0
 onext: 0x0
 xform: 0x0
refcnt: 0
   encalgxform: 0x81f36090
  authalgxform: 0x81f36380
  compalgxform: 0x0
 flags: d1040
   seq: 8
   exp_allocations: 0
  soft_allocations: 0
   cur_allocations: 0
 exp_bytes: 0
soft_bytes: 0
 cur_bytes: 1272048336570
   exp_timeout: 1200
  soft_timeout: 1080
   established: 1640372736
 first_use: 1640372754
soft_first_use: 0
 exp_first_use: 0
 last_used: 1640419926
   last_marked: 0
  cryptoid: 0
   tdb_spi: f9f247f0
 amxkeylen: 20
 emxkeylen: 20
 ivlen: 8
sproto: 50
   wnd: 16
satype: 2
   updates: 158
   dst: 192.168.42.112
   src: 192.168.42.100
amxkey: 0x0
emxkey: 0x0
   rpl: 5256398086
   ids: 0x812d8800
   ids_swapped: 0
   mtu: 0
mtutimeout: 0
 udpencap_port: 0
   tag: 0
   tap: 0
   rdomain: 0
  rdomain_post: 0
ddb{0}>

   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 40316  290490   5050  0  7 0x3ipsecctl
 22801   10869  51239 68  70x10sasyncd
 51239   39174  1  0  30x80  kqreadsasyncd
 13202  504041  43160 68  70x90isakmpd
 43160   53413  1  0  30x80  netio isakmpd
  5050   12722  1  0  30x10008b  sigsusp   ksh
 64387  345990  1  0  30x100098  poll  cron
 58819  219224  16427 95  30x100092  kqreadsmtpd
 67207  340852  16427103  30x100092  kqreadsmtpd
 97983  457473  16427 95  30x100092  kqreadsmtpd
  6608   99059  16427 95  30x100092  kqreadsmtpd
 91670  313820  16427 95  30x100092  kqreadsmtpd
  7571   97218  16427 95  30x100092  

Re: parallel ip forwarding

2021-12-24 Thread Vitaliy Makkoveev
`rtt_link’ list also has missing protection when rtfree() called
outside netlock. 

> On 25 Dec 2021, at 01:34, Alexander Bluhm  wrote:
> 
> On Fri, Dec 24, 2021 at 02:04:17PM +0100, Alexander Bluhm wrote:
>> On Fri, Dec 24, 2021 at 12:55:04AM +0100, Alexander Bluhm wrote:
>>> If you use only regular IPsec or forwarding, I hope it is stable.
>> 
>> false hope
>> 
>> rt_timer_add(fd81b97f5390,814218b0,802040c0,0) at 
>> rt_timer_
>> add+0xc7
>> icmp_mtudisc_clone(2438040a,0,1) at icmp_mtudisc_clone+0x174
>> ip_output_ipsec_pmtu_update(811e35a0,8000226c0a08,2438040a,0,0) 
>> at i
>> p_output_ipsec_pmtu_update+0x71
>> ip_output_ipsec_send(811e35a0,fd80b8735000,8000226c0a08,1) 
>> at i
>> p_output_ipsec_send+0x231
>> ip_output(fd80b8735000,0,8000226c0a08,1,0,0,457ea326bbcaae85) at 
>> ip_out
>> put+0x7ca
>> ip_forward(fd80b8735000,8011e048,fd819089ce70,0) at 
>> ip_forward+
>> 0x2da
>> ip_input_if(8000226c0b48,8000226c0b54,4,0,8011e048) at 
>> ip_input
>> _if+0x353
>> ipv4_input(8011e048,fd80b8735000) at ipv4_input+0x39
>> ether_input(8011e048,fd80b8735000) at ether_input+0x3ad
>> if_input_process(8011e048,8000226c0c38) at if_input_process+0x6f
>> ifiq_process(8011dc00) at ifiq_process+0x69
>> taskq_thread(8002e200) at taskq_thread+0x100
>> end trace frame: 0x0, count: -12
> 
> /usr/src/sys/net/route.c:1491
>3773:   49 8b 0fmov(%r15),%rcx
>3776:   49 8b 47 08 mov0x8(%r15),%rax
>377a:   48 85 c9test   %rcx,%rcx
>377d:   74 06   je 3785 
>377f:   48 83 c1 08 add$0x8,%rcx
>3783:   eb 08   jmp378d 
>3785:   49 8b 4f 20 mov0x20(%r15),%rcx
>3789:   48 83 c1 18 add$0x18,%rcx
>378d:   48 89 01mov%rax,(%rcx)
>3790:   49 8b 07mov(%r15),%rax
>3793:   49 8b 4f 08 mov0x8(%r15),%rcx
> *   3797:   48 89 01mov%rax,(%rcx)
>379a:   49 c7 47 08 ff ff ffmovq   $0x,0x8(%r15)
>37a1:   ff 
>37a2:   49 c7 07 ff ff ff ffmovq   $0x,(%r15)
> /usr/src/sys/net/route.c:1492
> 
>  1484  /*
>  1485   * If there's already a timer with this action, destroy it 
> before
>  1486   * we add a new one.
>  1487   */
>  1488  LIST_FOREACH(r, >rt_timer, rtt_link) {
>  1489  if (r->rtt_func == func) {
>  1490  LIST_REMOVE(r, rtt_link);
> * 1491  TAILQ_REMOVE(>rtt_queue->rtq_head, r, 
> rtt_next);
>  1492  if (r->rtt_queue->rtq_count > 0)
>  1493  r->rtt_queue->rtq_count--;
>  1494  else
>  1495  printf("rt_timer_add: rtq_count 
> reached 0\n");
>  1496  pool_put(_pool, r);
>  1497  break;  /* only one per list, so we can 
> quit... */
>  1498  }
>  1499  }
> 
> These lists don't look very MP safe.
> 



Re: parallel ip forwarding

2021-12-24 Thread Alexander Bluhm
On Fri, Dec 24, 2021 at 02:04:17PM +0100, Alexander Bluhm wrote:
> On Fri, Dec 24, 2021 at 12:55:04AM +0100, Alexander Bluhm wrote:
> > If you use only regular IPsec or forwarding, I hope it is stable.
> 
> false hope
> 
> rt_timer_add(fd81b97f5390,814218b0,802040c0,0) at 
> rt_timer_
> add+0xc7
> icmp_mtudisc_clone(2438040a,0,1) at icmp_mtudisc_clone+0x174
> ip_output_ipsec_pmtu_update(811e35a0,8000226c0a08,2438040a,0,0) 
> at i
> p_output_ipsec_pmtu_update+0x71
> ip_output_ipsec_send(811e35a0,fd80b8735000,8000226c0a08,1) at 
> i
> p_output_ipsec_send+0x231
> ip_output(fd80b8735000,0,8000226c0a08,1,0,0,457ea326bbcaae85) at 
> ip_out
> put+0x7ca
> ip_forward(fd80b8735000,8011e048,fd819089ce70,0) at 
> ip_forward+
> 0x2da
> ip_input_if(8000226c0b48,8000226c0b54,4,0,8011e048) at 
> ip_input
> _if+0x353
> ipv4_input(8011e048,fd80b8735000) at ipv4_input+0x39
> ether_input(8011e048,fd80b8735000) at ether_input+0x3ad
> if_input_process(8011e048,8000226c0c38) at if_input_process+0x6f
> ifiq_process(8011dc00) at ifiq_process+0x69
> taskq_thread(8002e200) at taskq_thread+0x100
> end trace frame: 0x0, count: -12

/usr/src/sys/net/route.c:1491
3773:   49 8b 0fmov(%r15),%rcx
3776:   49 8b 47 08 mov0x8(%r15),%rax
377a:   48 85 c9test   %rcx,%rcx
377d:   74 06   je 3785 
377f:   48 83 c1 08 add$0x8,%rcx
3783:   eb 08   jmp378d 
3785:   49 8b 4f 20 mov0x20(%r15),%rcx
3789:   48 83 c1 18 add$0x18,%rcx
378d:   48 89 01mov%rax,(%rcx)
3790:   49 8b 07mov(%r15),%rax
3793:   49 8b 4f 08 mov0x8(%r15),%rcx
*   3797:   48 89 01mov%rax,(%rcx)
379a:   49 c7 47 08 ff ff ffmovq   $0x,0x8(%r15)
37a1:   ff 
37a2:   49 c7 07 ff ff ff ffmovq   $0x,(%r15)
/usr/src/sys/net/route.c:1492

  1484  /*
  1485   * If there's already a timer with this action, destroy it 
before
  1486   * we add a new one.
  1487   */
  1488  LIST_FOREACH(r, >rt_timer, rtt_link) {
  1489  if (r->rtt_func == func) {
  1490  LIST_REMOVE(r, rtt_link);
* 1491  TAILQ_REMOVE(>rtt_queue->rtq_head, r, 
rtt_next);
  1492  if (r->rtt_queue->rtq_count > 0)
  1493  r->rtt_queue->rtq_count--;
  1494  else
  1495  printf("rt_timer_add: rtq_count reached 
0\n");
  1496  pool_put(_pool, r);
  1497  break;  /* only one per list, so we can quit... 
*/
  1498  }
  1499  }

These lists don't look very MP safe.



Re: parallel ip forwarding

2021-12-24 Thread Alexander Bluhm
On Fri, Dec 24, 2021 at 12:55:04AM +0100, Alexander Bluhm wrote:
> If you use only regular IPsec or forwarding, I hope it is stable.

false hope

rt_timer_add(fd81b97f5390,814218b0,802040c0,0) at rt_timer_
add+0xc7
icmp_mtudisc_clone(2438040a,0,1) at icmp_mtudisc_clone+0x174
ip_output_ipsec_pmtu_update(811e35a0,8000226c0a08,2438040a,0,0) at i
p_output_ipsec_pmtu_update+0x71
ip_output_ipsec_send(811e35a0,fd80b8735000,8000226c0a08,1) at i
p_output_ipsec_send+0x231
ip_output(fd80b8735000,0,8000226c0a08,1,0,0,457ea326bbcaae85) at ip_out
put+0x7ca
ip_forward(fd80b8735000,8011e048,fd819089ce70,0) at ip_forward+
0x2da
ip_input_if(8000226c0b48,8000226c0b54,4,0,8011e048) at ip_input
_if+0x353
ipv4_input(8011e048,fd80b8735000) at ipv4_input+0x39
ether_input(8011e048,fd80b8735000) at ether_input+0x3ad
if_input_process(8011e048,8000226c0c38) at if_input_process+0x6f
ifiq_process(8011dc00) at ifiq_process+0x69
taskq_thread(8002e200) at taskq_thread+0x100
end trace frame: 0x0, count: -12



Re: parallel ip forwarding

2021-12-24 Thread Alexander Bluhm
On Fri, Dec 24, 2021 at 04:16:28PM +0900, YASUOKA Masahiko wrote:
> > - npppd l2pt ipsecflowinfo is not MP safe
> 
> Does this mean the things we are discussing on the "Fix
> ipsp_spd_lookup() for transport mode" thread?  I wonder if there is
> another issue.

In this mail thread I was concerned about things might get worse.

Currently I see these problems:

tdb_free() will be called with a shared netlock.  From there
ipsp_ids_free() is called.

if (--ids->id_refcount > 0)
return;

This ref count needs to be atomic.

if (LIST_EMPTY(_ids_gc_list))
timeout_add_sec(_ids_gc_timeout, 1);
LIST_INSERT_HEAD(_ids_gc_list, ids, id_gc_list);

And some mutex should protect ipsp_ids_gc_list.

bluhm



Re: parallel ip forwarding

2021-12-23 Thread YASUOKA Masahiko
Hello,

On Fri, 24 Dec 2021 00:55:04 +0100
Alexander Bluhm  wrote:
> On Fri, Dec 03, 2021 at 08:35:45PM +0100, Alexander Bluhm wrote:
>> Note that IPsec still has the workaround to disable multiple queues.
> 
> I think we can remove the ipsec_in_use workaround now.  The IPsec
> path is protected with the kernel lock.
> 
> There are some issues left:
> - npppd l2pt ipsecflowinfo is not MP safe

Does this mean the things we are discussing on the "Fix
ipsp_spd_lookup() for transport mode" thread?  I wonder if there is
another issue.

> - the acquire SA feature is not MP safe
> - Hrvoje has seen a panic with sasync



Re: parallel ip forwarding

2021-12-23 Thread Alexander Bluhm
On Fri, Dec 03, 2021 at 08:35:45PM +0100, Alexander Bluhm wrote:
> Note that IPsec still has the workaround to disable multiple queues.

I think we can remove the ipsec_in_use workaround now.  The IPsec
path is protected with the kernel lock.

There are some issues left:
- npppd l2pt ipsecflowinfo is not MP safe
- the acquire SA feature is not MP safe
- Hrvoje has seen a panic with sasync

If you use one of these, the diff below should trigger crashes faster.
If you use only regular IPsec or forwarding, I hope it is stable.

bluhm

Index: net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.644
diff -u -p -r1.644 if.c
--- net/if.c11 Nov 2021 10:03:10 -  1.644
+++ net/if.c23 Dec 2021 23:11:20 -
@@ -108,6 +108,10 @@
 #include 
 #endif
 
+#ifdef IPSEC
+#include 
+#endif
+
 #ifdef MPLS
 #include 
 #endif
@@ -237,7 +241,7 @@ int ifq_congestion;
 
 int netisr;
 
-#defineNET_TASKQ   1
+#defineNET_TASKQ   4
 struct taskq   *nettqmp[NET_TASKQ];
 
 struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
@@ -834,15 +838,10 @@ if_input_process(struct ifnet *ifp, stru
 * lists and the socket layer.
 */
 
-   /*
-* XXXSMP IPsec data structures are not ready to be accessed
-* by multiple network threads in parallel.  In this case
-* use an exclusive lock.
-*/
-   NET_LOCK();
+   NET_RLOCK_IN_SOFTNET();
while ((m = ml_dequeue(ml)) != NULL)
(*ifp->if_input)(ifp, m);
-   NET_UNLOCK();
+   NET_RUNLOCK_IN_SOFTNET();
 }
 
 void
@@ -899,6 +898,12 @@ if_netisr(void *unused)
arpintr();
KERNEL_UNLOCK();
}
+#endif
+   if (n & (1 << NETISR_IP))
+   ipintr();
+#ifdef INET6
+   if (n & (1 << NETISR_IPV6))
+   ip6intr();
 #endif
 #if NPPP > 0
if (n & (1 << NETISR_PPP)) {
Index: net/if_ethersubr.c
===
RCS file: /cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.276
diff -u -p -r1.276 if_ethersubr.c
--- net/if_ethersubr.c  19 Aug 2021 10:22:00 -  1.276
+++ net/if_ethersubr.c  23 Dec 2021 23:11:20 -
@@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct 
 
switch (af) {
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IP);
@@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IPV6);
@@ -271,13 +277,19 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
break;
 #endif
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
break;
@@ -529,12 +541,14 @@ ether_input(struct ifnet *ifp, struct mb
case ETHERTYPE_PPPOE:
if (m->m_flags & (M_MCAST | M_BCAST))
goto dropanyway;
+   KERNEL_LOCK();
 #ifdef PIPEX
if (pipex_enable) {
struct pipex_session *session;
 
if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
pipex_pppoe_input(m, session);
+   KERNEL_UNLOCK();
return;
}
}
@@ -543,6 +557,7 @@ ether_input(struct ifnet *ifp, struct mb
pppoe_disc_input(m);
else
pppoe_data_input(m);
+   KERNEL_UNLOCK();
return;
 #endif
 #ifdef MPLS
Index: net/ifq.c

Re: parallel ip forwarding

2021-12-06 Thread Alexander Bluhm
On Sat, Dec 04, 2021 at 10:41:02AM +0100, Hrvoje Popovski wrote:
> r620-2# uvm_fault(0x8229d4e0, 0x137, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at  ipsp_spd_lookup+0xa2f:  movq%rax,0(%rcx)
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>  419237  67407  0 0x14000  0x2000  softnet
> *157694  94649  0 0x14000  0x2002K softnet
> ipsp_spd_lookup(fd80a4139800,2,14,2,0,0,5b815d966b14b44b,fd80a4139800)
> at ipsp_spd_lookup+0xa2f

Thanks a lot for the test.  It crashes here:

/home/bluhm/openbsd/cvs/src/sys/netinet/ip_spd.c:414
 cdc:   48 03 0aadd(%rdx),%rcx
*cdf:   48 89 01mov%rax,(%rcx)
 ce2:   49 8b 80 30 01 00 00mov0x130(%r8),%rax
 ce9:   49 8b 88 38 01 00 00mov0x138(%r8),%rcx
 cf0:   48 89 01mov%rax,(%rcx)
 cf3:   49 c7 80 38 01 00 00movq   $0x,0x138(%r8)
 cfa:   ff ff ff ff 
 cfe:   49 c7 80 30 01 00 00movq   $0x,0x130(%r8)
 d05:   ff ff ff ff 
/home/bluhm/openbsd/cvs/src/sys/netinet/ip_spd.c:416

  nomatchout:
/* Cached TDB was not good. */
*   TAILQ_REMOVE(>ipo_tdb->tdb_policy_head, ipo,
ipo_tdb_next);
tdb_unref(ipo->ipo_tdb);
ipo->ipo_tdb = NULL;
ipo->ipo_last_searched = 0;

So mvs@'s concerns are correct, my IPsec workaround is not sufficient.
I want to avoid another rwlock in the input path.  Maybe I can throw
some mutexes into IPsec to make it work.

bluhm

> ip_output_ipsec_lookup(fd80a4139800,14,0,800022c60228,0) at
> ip_output_ipsec_lookup+0x4c
> ip_output(fd80a4139800,0,800022c603e8,1,0,0,3ada3367ffb43fe1) at
> ip_output+0x39d
> ip_forward(fd80a4139800,80087048,fd8394511078,0) at
> ip_forward+0x26a
> ip_input_if(800022c60528,800022c60534,4,0,80087048) at
> ip_input_if+0x353
> ipv4_input(80087048,fd80a4139800) at ipv4_input+0x39
> ether_input(80087048,fd80a4139800) at ether_input+0x3aa
> if_input_process(80087048,800022c60618) at if_input_process+0x92
> ifiq_process(80087458) at ifiq_process+0x69
> taskq_thread(8002f080) at taskq_thread+0x81
> end trace frame: 0x0, count: 5



Re: parallel ip forwarding

2021-12-04 Thread Hrvoje Popovski
On 4.12.2021. 10:41, Hrvoje Popovski wrote:
> On 3.12.2021. 20:35, Alexander Bluhm wrote:
>> Hi,
>>
>> After implementing MP safe refcounting for IPsec TDB, I wonder how
>> stable my diff for forwarding on multiple CPU is.
>>
>> Note that IPsec still has the workaround to disable multiple queues.
>> We do not have a mutex that protects the TDB fields yet.  We have
>> only fixed the memory management.
>>
>> My goal is to get real MP pressure on the lower part of the IP
>> network stack.  Only this will show remaining bugs.
> Hi,
> 
> with only plain forwarding it seems that everything works just fine, but
> with ipsec i'm getting this panic. i will leave ddb console active if
> something else will be needed


Now with WITNESS

r620-2# uvm_fault(0x8226ee78, 0x137, 0, 2) -> e
kernel: page fault trap, code=0
Stopped at  ipsp_spd_lookup+0xa2f:  movq%rax,0(%rcx)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 258332  43705  0 0x14000  0x2004  softnet
*440260  99913  0 0x14000  0x2003K softnet
  63340  24364  0 0x14000  0x2002  softnet
ipsp_spd_lookup(fd80a46ed700,2,14,2,0,0,dcfe3c78989b6cd1,fd80a46ed700)
at ipsp_spd_lookup+0xa2f
ip_output_ipsec_lookup(fd80a46ed700,14,0,800022c71248,0) at
ip_output_ipsec_lookup+0x4c
ip_output(fd80a46ed700,0,800022c71408,1,0,0,6db6075421f27eb8) at
ip_output+0x39d
ip_forward(fd80a46ed700,80087048,fd83b4374cb0,0) at
ip_forward+0x26a
ip_input_if(800022c71548,800022c71554,4,0,80087048) at
ip_input_if+0x353
ipv4_input(80087048,fd80a46ed700) at ipv4_input+0x39
ether_input(80087048,fd80a46ed700) at ether_input+0x3aa
if_input_process(80087048,800022c71638) at if_input_process+0x92
ifiq_process(80086e00) at ifiq_process+0x69
taskq_thread(80030200) at taskq_thread+0x9f
end trace frame: 0x0, count: 5
https://www.openbsd.org/ddb.html describes the minimum info required in
bug reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{3}>


ddb{3}> show reg
rdi0
rsi   0x8145d018
rbp   0x800022c71150
rbx0
rdx  0x1
rcx0x137
rax   0x
r80xfd839a222d30
r90x81453904
r10   0x8142f240
r11   0xebc2ae563992db8e
r12   0x800022c71178
r130
r14  0x1
r150
rip   0x811f56cfipsp_spd_lookup+0xa2f
cs   0x8
rflags   0x10213__ALIGN_SIZE+0xf213
rsp   0x800022c71040
ss  0x10
ipsp_spd_lookup+0xa2f:  movq%rax,0(%rcx)
ddb{3}>


ddb{3}> show locks
exclusive kernel_lock _lock r = 0 (0x822b3ce8)
#0  witness_lock+0x333
#1  kpageflttrap+0x172
#2  kerntrap+0x91
#3  alltraps_kern_meltdown+0x7b
#4  ipsp_spd_lookup+0xa2f
#5  ip_output_ipsec_lookup+0x4c
#6  ip_output+0x39d
#7  ip_forward+0x26a
#8  ip_input_if+0x353
#9  ipv4_input+0x39
#10 ether_input+0x3aa
#11 if_input_process+0x92
#12 ifiq_process+0x69
#13 taskq_thread+0x9f
#14 proc_trampoline+0x1c
shared rwlock netlock r = 0 (0x8217f598)
#0  witness_lock+0x333
#1  rw_enter+0x27f
#2  if_input_process+0x75
#3  ifiq_process+0x69
#4  taskq_thread+0x9f
#5  proc_trampoline+0x1c
shared rwlock softnet r = 0 (0x80030270)
#0  witness_lock+0x333
#1  taskq_thread+0x92
#2  proc_trampoline+0x1c
ddb{3}>


ddb{3}> show all locks
Process 43705 (softnet) thread 0x8000e540 (258332)
shared rwlock netlock r = 0 (0x8217f598)
#0  witness_lock+0x333
#1  rw_enter+0x27f
#2  if_input_process+0x75
#3  ifiq_process+0x69
#4  taskq_thread+0x9f
#5  proc_trampoline+0x1c
shared rwlock softnet r = 0 (0x80030370)
#0  witness_lock+0x333
#1  taskq_thread+0x92
#2  proc_trampoline+0x1c
Process 99913 (softnet) thread 0x8000e000 (440260)
exclusive kernel_lock _lock r = 0 (0x822b3ce8)
#0  witness_lock+0x333
#1  kpageflttrap+0x172
#2  kerntrap+0x91
#3  alltraps_kern_meltdown+0x7b
#4  ipsp_spd_lookup+0xa2f
#5  ip_output_ipsec_lookup+0x4c
#6  ip_output+0x39d
#7  ip_forward+0x26a
#8  ip_input_if+0x353
#9  ipv4_input+0x39
#10 ether_input+0x3aa
#11 if_input_process+0x92
#12 ifiq_process+0x69
#13 taskq_thread+0x9f
#14 proc_trampoline+0x1c
shared rwlock netlock r = 0 (0x8217f598)
#0  witness_lock+0x333
#1  rw_enter+0x27f
#2  if_input_process+0x75
#3  ifiq_process+0x69
#4  taskq_thread+0x9f
#5  proc_trampoline+0x1c
shared rwlock softnet r = 0 (0x80030270)
#0  witness_lock+0x333
#1  taskq_thread+0x92
#2  proc_trampoline+0x1c
Process 39040 (softnet) thread 0x8000e2a0 (358256)
shared rwlock softnet r = 0 (0x80030170)
#0  witness_lock+0x333
#1  

Re: parallel ip forwarding

2021-12-04 Thread Hrvoje Popovski
On 4.12.2021. 10:41, Hrvoje Popovski wrote:
> Hi,
> 
> with only plain forwarding it seems that everything works just fine, but
> with ipsec i'm getting this panic. i will leave ddb console active if
> something else will be needed

regarding performance ... without diff i'm getting 1.1Mpps and with this
diff i'm getting 2Mpps of plain forwarding ..




Re: parallel ip forwarding

2021-12-04 Thread Hrvoje Popovski
On 3.12.2021. 20:35, Alexander Bluhm wrote:
> Hi,
> 
> After implementing MP safe refcounting for IPsec TDB, I wonder how
> stable my diff for forwarding on multiple CPU is.
> 
> Note that IPsec still has the workaround to disable multiple queues.
> We do not have a mutex that protects the TDB fields yet.  We have
> only fixed the memory management.
> 
> My goal is to get real MP pressure on the lower part of the IP
> network stack.  Only this will show remaining bugs.

Hi,

with only plain forwarding it seems that everything works just fine, but
with ipsec i'm getting this panic. i will leave ddb console active if
something else will be needed


r620-2# uvm_fault(0x8229d4e0, 0x137, 0, 2) -> e
kernel: page fault trap, code=0
Stopped at  ipsp_spd_lookup+0xa2f:  movq%rax,0(%rcx)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 419237  67407  0 0x14000  0x2000  softnet
*157694  94649  0 0x14000  0x2002K softnet
ipsp_spd_lookup(fd80a4139800,2,14,2,0,0,5b815d966b14b44b,fd80a4139800)
at ipsp_spd_lookup+0xa2f
ip_output_ipsec_lookup(fd80a4139800,14,0,800022c60228,0) at
ip_output_ipsec_lookup+0x4c
ip_output(fd80a4139800,0,800022c603e8,1,0,0,3ada3367ffb43fe1) at
ip_output+0x39d
ip_forward(fd80a4139800,80087048,fd8394511078,0) at
ip_forward+0x26a
ip_input_if(800022c60528,800022c60534,4,0,80087048) at
ip_input_if+0x353
ipv4_input(80087048,fd80a4139800) at ipv4_input+0x39
ether_input(80087048,fd80a4139800) at ether_input+0x3aa
if_input_process(80087048,800022c60618) at if_input_process+0x92
ifiq_process(80087458) at ifiq_process+0x69
taskq_thread(8002f080) at taskq_thread+0x81
end trace frame: 0x0, count: 5
https://www.openbsd.org/ddb.html describes the minimum info required in
bug reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{2}>

ddb{2}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 84832   15199  94097 68  30x10  netlock   isakmpd
 94097  209788  1  0  30x80  netio isakmpd
 15214  351483  1  0  30x100083  ttyin ksh
 12905  119865  1  0  30x100098  poll  cron
 84388  106290  61660 95  30x100092  kqreadsmtpd
 49054  464016  61660103  30x100092  kqreadsmtpd
 24623  388600  61660 95  30x100092  kqreadsmtpd
 25783   71877  61660 95  30x100092  kqreadsmtpd
 44044  451515  61660 95  30x100092  kqreadsmtpd
 74629  163500  61660 95  30x100092  kqreadsmtpd
 61660   91566  1  0  30x100080  kqreadsmtpd
 44941  436617  1  0  30x88  poll  sshd
 63116  236112  1  0  30x100080  poll  ntpd
 67685  455461  13478 83  30x100092  poll  ntpd
 13478  271069  1 83  30x100092  poll  ntpd
  4556  497201  93102 73  30x100090  kqreadsyslogd
 93102 715  1  0  30x100082  netio syslogd
 75807   17005  0  0  3 0x14200  bored smr
 19891  296875  0  0  3 0x14200  pgzerozerothread
 85805  493633  0  0  3 0x14200  aiodoned  aiodoned
 30502  291595  0  0  3 0x14200  syncerupdate
 70346  429811  0  0  3 0x14200  cleaner   cleaner
  2149  206582  0  0  3 0x14200  reaperreaper
 34585  457270  0  0  3 0x14200  pgdaemon  pagedaemon
 68729  397903  0  0  3 0x14200  usbtskusbtask
 65328  269730  0  0  3 0x14200  usbatsk   usbatsk
 69367  498053  0  0  3  0x40014200  acpi0 acpi0
  5156  405582  0  0  7  0x40014200idle5
 53211  338561  0  0  7  0x40014200idle4
 19386  131626  0  0  7  0x40014200idle3
 22779  421169  0  0  3  0x40014200idle2
 27190  363359  0  0  7  0x40014200idle1
  1080   80711  0  0  3 0x14200  bored sensors
 67407  419237  0  0  7 0x14200softnet
 34617   10042  0  0  3 0x14200  netlock   softnet
 50133  404397  0  0  3 0x14200  netlock   softnet
*94649  157694  0  0  7 0x14200softnet
 51030  188067  0  0  3 0x14200  bored systqmp
 46228  294212  0  0  3 0x14200  bored systq
 62154  451682  0  0  2  0x40014200softclock
 34808  227380  0  0  3  0x40014200idle0
 1   24458  0  0  30x82  wait  init
 0   0 -1  0  3 0x10200  scheduler swapper


ddb{2}> trace /t 0t157694
80087048(1,8122bc39,800022c60550,800022c60528,80002
2c60534,4) at 0x80087048

Re: parallel ip forwarding

2021-12-03 Thread Vitaliy Makkoveev
On Fri, Dec 03, 2021 at 08:35:45PM +0100, Alexander Bluhm wrote:
> Hi,
> 
> After implementing MP safe refcounting for IPsec TDB, I wonder how
> stable my diff for forwarding on multiple CPU is.
> 
> Note that IPsec still has the workaround to disable multiple queues.
> We do not have a mutex that protects the TDB fields yet.  We have
> only fixed the memory management.
> 
> My goal is to get real MP pressure on the lower part of the IP
> network stack.  Only this will show remaining bugs.
> 
> bluhm
> 

This unlocked `ipsec_in_use' check doesn't work. 

SADB_X_ADDFLOW and `ipsec_in_use' increment are netlock serialized. 

>  if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
>  {
>   struct mbuf *m;
> + int exclusive_lock = 0;
>  
>   if (ml_empty(ml))
>   return;
> @@ -834,15 +839,27 @@ if_input_process(struct ifnet *ifp, stru
>* lists and the socket layer.
>*/
>  
> +#ifdef IPSEC
>   /*
>* XXXSMP IPsec data structures are not ready to be accessed
>* by multiple network threads in parallel.  In this case
>* use an exclusive lock.
>*/
> - NET_LOCK();
> + if (ipsec_in_use)
> + exclusive_lock = 1;
> +#endif

Concurrent SADB_X_ADDFLOW threads could add some SA between your
unlocked `ipsec_in_use' check and the following shared netlock. This
time `ipsec_in_use' modification is netlock serialized but you could add
the new rwlock(9). Since we are only interesting in non-null value of
`ipsec_in_use' it's enough to add it here and around SADB_X_ADDFLOW case
within pfkeyv2_send().

I already posted the fix for this diff to your old [1] thread.

> + if (exclusive_lock)
> + NET_LOCK();
> + else
> + NET_RLOCK_IN_SOFTNET();
> +
>   while ((m = ml_dequeue(ml)) != NULL)
>   (*ifp->if_input)(ifp, m);

1. https://marc.info/?l=openbsd-tech=162698721127298=2

> Index: net/if.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
> retrieving revision 1.644
> diff -u -p -r1.644 if.c
> --- net/if.c  11 Nov 2021 10:03:10 -  1.644
> +++ net/if.c  3 Dec 2021 18:51:29 -
> @@ -108,6 +108,10 @@
>  #include 
>  #endif
>  
> +#ifdef IPSEC
> +#include 
> +#endif
> +
>  #ifdef MPLS
>  #include 
>  #endif
> @@ -237,7 +241,7 @@ int   ifq_congestion;
>  
>  int   netisr;
>  
> -#define  NET_TASKQ   1
> +#define  NET_TASKQ   4
>  struct taskq *nettqmp[NET_TASKQ];
>  
>  struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
> @@ -814,6 +818,7 @@ void
>  if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
>  {
>   struct mbuf *m;
> + int exclusive_lock = 0;
>  
>   if (ml_empty(ml))
>   return;
> @@ -834,15 +839,27 @@ if_input_process(struct ifnet *ifp, stru
>* lists and the socket layer.
>*/
>  
> +#ifdef IPSEC
>   /*
>* XXXSMP IPsec data structures are not ready to be accessed
>* by multiple network threads in parallel.  In this case
>* use an exclusive lock.
>*/
> - NET_LOCK();
> + if (ipsec_in_use)
> + exclusive_lock = 1;
> +#endif
> + if (exclusive_lock)
> + NET_LOCK();
> + else
> + NET_RLOCK_IN_SOFTNET();
> +
>   while ((m = ml_dequeue(ml)) != NULL)
>   (*ifp->if_input)(ifp, m);
> - NET_UNLOCK();
> +
> + if (exclusive_lock)
> + NET_UNLOCK();
> + else
> + NET_RUNLOCK_IN_SOFTNET();
>  }
>  
>  void
> @@ -899,6 +916,12 @@ if_netisr(void *unused)
>   arpintr();
>   KERNEL_UNLOCK();
>   }
> +#endif
> + if (n & (1 << NETISR_IP))
> + ipintr();
> +#ifdef INET6
> + if (n & (1 << NETISR_IPV6))
> + ip6intr();
>  #endif
>  #if NPPP > 0
>   if (n & (1 << NETISR_PPP)) {
> Index: net/if_ethersubr.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v
> retrieving revision 1.276
> diff -u -p -r1.276 if_ethersubr.c
> --- net/if_ethersubr.c19 Aug 2021 10:22:00 -  1.276
> +++ net/if_ethersubr.c3 Dec 2021 18:51:29 -
> @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct 
>  
>   switch (af) {
>   case AF_INET:
> + KERNEL_LOCK();
> + /* XXXSMP there is a MP race in arpresolve() */
>   error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
> + KERNEL_UNLOCK();
>   if (error)
>   return (error);
>   eh->ether_type = htons(ETHERTYPE_IP);
> @@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct 
>   break;
>  #ifdef INET6
>   case AF_INET6:
> + KERNEL_LOCK();
> + /* XXXSMP there is a MP race in nd6_resolve() */
>  

parallel ip forwarding

2021-12-03 Thread Alexander Bluhm
Hi,

After implementing MP safe refcounting for IPsec TDB, I wonder how
stable my diff for forwarding on multiple CPU is.

Note that IPsec still has the workaround to disable multiple queues.
We do not have a mutex that protects the TDB fields yet.  We have
only fixed the memory management.

My goal is to get real MP pressure on the lower part of the IP
network stack.  Only this will show remaining bugs.

bluhm

Index: net/if.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
retrieving revision 1.644
diff -u -p -r1.644 if.c
--- net/if.c11 Nov 2021 10:03:10 -  1.644
+++ net/if.c3 Dec 2021 18:51:29 -
@@ -108,6 +108,10 @@
 #include 
 #endif
 
+#ifdef IPSEC
+#include 
+#endif
+
 #ifdef MPLS
 #include 
 #endif
@@ -237,7 +241,7 @@ int ifq_congestion;
 
 int netisr;
 
-#defineNET_TASKQ   1
+#defineNET_TASKQ   4
 struct taskq   *nettqmp[NET_TASKQ];
 
 struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
@@ -814,6 +818,7 @@ void
 if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
 {
struct mbuf *m;
+   int exclusive_lock = 0;
 
if (ml_empty(ml))
return;
@@ -834,15 +839,27 @@ if_input_process(struct ifnet *ifp, stru
 * lists and the socket layer.
 */
 
+#ifdef IPSEC
/*
 * XXXSMP IPsec data structures are not ready to be accessed
 * by multiple network threads in parallel.  In this case
 * use an exclusive lock.
 */
-   NET_LOCK();
+   if (ipsec_in_use)
+   exclusive_lock = 1;
+#endif
+   if (exclusive_lock)
+   NET_LOCK();
+   else
+   NET_RLOCK_IN_SOFTNET();
+
while ((m = ml_dequeue(ml)) != NULL)
(*ifp->if_input)(ifp, m);
-   NET_UNLOCK();
+
+   if (exclusive_lock)
+   NET_UNLOCK();
+   else
+   NET_RUNLOCK_IN_SOFTNET();
 }
 
 void
@@ -899,6 +916,12 @@ if_netisr(void *unused)
arpintr();
KERNEL_UNLOCK();
}
+#endif
+   if (n & (1 << NETISR_IP))
+   ipintr();
+#ifdef INET6
+   if (n & (1 << NETISR_IPV6))
+   ip6intr();
 #endif
 #if NPPP > 0
if (n & (1 << NETISR_PPP)) {
Index: net/if_ethersubr.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.276
diff -u -p -r1.276 if_ethersubr.c
--- net/if_ethersubr.c  19 Aug 2021 10:22:00 -  1.276
+++ net/if_ethersubr.c  3 Dec 2021 18:51:29 -
@@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct 
 
switch (af) {
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IP);
@@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IPV6);
@@ -271,13 +277,19 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
break;
 #endif
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
break;
@@ -529,12 +541,14 @@ ether_input(struct ifnet *ifp, struct mb
case ETHERTYPE_PPPOE:
if (m->m_flags & (M_MCAST | M_BCAST))
goto dropanyway;
+   KERNEL_LOCK();
 #ifdef PIPEX
if (pipex_enable) {
struct pipex_session *session;
 
if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
pipex_pppoe_input(m, session);
+   KERNEL_UNLOCK();