Re: forwarding in parallel ipsec workaround

2021-10-04 Thread Alexander Bluhm
On Sat, Oct 02, 2021 at 03:23:57PM -0700, Chris Cappuccio wrote:
> Hrvoje Popovski [hrv...@srce.hr] wrote:
> > 
> > box didn't panic, just stopped forwarding traffic through tunnel.
> 
> any chance any progress has been made here? is there any newer versions
> of these diffs floating around?

Main problem is that ipsec is not stable and we only have an ugly
workaround.  One solution is to implement tdb ref counting.  I am
working on a diff.

bluhm



Re: forwarding in parallel ipsec workaround

2021-10-02 Thread Chris Cappuccio
Hrvoje Popovski [hrv...@srce.hr] wrote:
> 
> box didn't panic, just stopped forwarding traffic through tunnel.

any chance any progress has been made here? is there any newer versions
of these diffs floating around?



Re: forwarding in parallel ipsec workaround

2021-07-23 Thread Hrvoje Popovski
On 23.7.2021. 16:20, Vitaliy Makkoveev wrote:
> On Thu, Jul 22, 2021 at 11:30:02PM +0200, Hrvoje Popovski wrote:
>> On 22.7.2021. 22:52, Vitaliy Makkoveev wrote:
>>> On Thu, Jul 22, 2021 at 08:38:04PM +0200, Hrvoje Popovski wrote:
 On 22.7.2021. 12:21, Hrvoje Popovski wrote:
> Thank you for explanation..
>
> after hitting box all night, box panic and i was able to reproduce it
> this morning ... I'm not sure but box panic after hour or more of
> sending traffic through iked tunnel ..
> I will try to reproduce it through ipsec tunnel ..

 with isakmpd i've got panic after 5 or more hours

 r620-1# uvm_fault(0x822b2158, 0x137, 0, 2) -> e
 kernel: page fault trap, code=0
 Stopped at  tdb_free+0x9c:  movq%rsi,0(%rdi)
 TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
  469292  75215 680x10  02  isakmpd
 *142487  42745  0 0x14000 0x42000K softclock
 tdb_free(8118f6e0) at tdb_free+0x9c
 tdb_timeout(8118f6e0) at tdb_timeout+0x77
 softclock_thread(8000efc0) at softclock_thread+0x16e
 end trace frame: 0x0, count: 12
 https://www.openbsd.org/ddb.html describes the minimum info required in
 bug reports.  Insufficient info makes it difficult to find and fix bugs.

>>> There is the bluhm@'s diff with serialized `ipsec_in_use' access. Can
>>> you try it?
>>
>> Hi,
>>
>> I'm running this diff with "ipsec crypto no queue" and i'm seeing
>> traffic drops. This is little strange because "ipsec crypto no queue"
>> diff take care of those drops...
>>
> 
> Thanks!
> 
> Did you caught panics?
> Could you share "netstat -s -p ipsec" output?
> 

Hi,

box didn't panic, just stopped forwarding traffic through tunnel.

ipsec:
0 input IPsec packets
3299331 output IPsec packets
0 input bytes
1781638740 output bytes
0 input bytes, decompressed
2235858152 output bytes, uncompressed
0 packets dropped on input
1030761 packets dropped on output
0 packets that failed crypto processing
0 packets for which no XFORM was set in TDB received
447309 packets for which no TDB was found

it still little strange to me that with "ipsec crypto no queue"
diff i see traffic drops ... so i will do all over again just to be sure
that i'm not doing something wrong ..



Re: forwarding in parallel ipsec workaround

2021-07-23 Thread Vitaliy Makkoveev
On Thu, Jul 22, 2021 at 11:30:02PM +0200, Hrvoje Popovski wrote:
> On 22.7.2021. 22:52, Vitaliy Makkoveev wrote:
> > On Thu, Jul 22, 2021 at 08:38:04PM +0200, Hrvoje Popovski wrote:
> >> On 22.7.2021. 12:21, Hrvoje Popovski wrote:
> >>> Thank you for explanation..
> >>>
> >>> after hitting box all night, box panic and i was able to reproduce it
> >>> this morning ... I'm not sure but box panic after hour or more of
> >>> sending traffic through iked tunnel ..
> >>> I will try to reproduce it through ipsec tunnel ..
> >>
> >> with isakmpd i've got panic after 5 or more hours
> >>
> >> r620-1# uvm_fault(0x822b2158, 0x137, 0, 2) -> e
> >> kernel: page fault trap, code=0
> >> Stopped at  tdb_free+0x9c:  movq%rsi,0(%rdi)
> >> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> >>  469292  75215 680x10  02  isakmpd
> >> *142487  42745  0 0x14000 0x42000K softclock
> >> tdb_free(8118f6e0) at tdb_free+0x9c
> >> tdb_timeout(8118f6e0) at tdb_timeout+0x77
> >> softclock_thread(8000efc0) at softclock_thread+0x16e
> >> end trace frame: 0x0, count: 12
> >> https://www.openbsd.org/ddb.html describes the minimum info required in
> >> bug reports.  Insufficient info makes it difficult to find and fix bugs.
> >>
> > There is the bluhm@'s diff with serialized `ipsec_in_use' access. Can
> > you try it?
> 
> Hi,
> 
> I'm running this diff with "ipsec crypto no queue" and i'm seeing
> traffic drops. This is little strange because "ipsec crypto no queue"
> diff take care of those drops...
> 

Thanks!

Did you caught panics?
Could you share "netstat -s -p ipsec" output?



Re: forwarding in parallel ipsec workaround

2021-07-22 Thread Hrvoje Popovski
On 22.7.2021. 22:52, Vitaliy Makkoveev wrote:
> On Thu, Jul 22, 2021 at 08:38:04PM +0200, Hrvoje Popovski wrote:
>> On 22.7.2021. 12:21, Hrvoje Popovski wrote:
>>> Thank you for explanation..
>>>
>>> after hitting box all night, box panic and i was able to reproduce it
>>> this morning ... I'm not sure but box panic after hour or more of
>>> sending traffic through iked tunnel ..
>>> I will try to reproduce it through ipsec tunnel ..
>>
>> with isakmpd i've got panic after 5 or more hours
>>
>> r620-1# uvm_fault(0x822b2158, 0x137, 0, 2) -> e
>> kernel: page fault trap, code=0
>> Stopped at  tdb_free+0x9c:  movq%rsi,0(%rdi)
>> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>>  469292  75215 680x10  02  isakmpd
>> *142487  42745  0 0x14000 0x42000K softclock
>> tdb_free(8118f6e0) at tdb_free+0x9c
>> tdb_timeout(8118f6e0) at tdb_timeout+0x77
>> softclock_thread(8000efc0) at softclock_thread+0x16e
>> end trace frame: 0x0, count: 12
>> https://www.openbsd.org/ddb.html describes the minimum info required in
>> bug reports.  Insufficient info makes it difficult to find and fix bugs.
>>
> There is the bluhm@'s diff with serialized `ipsec_in_use' access. Can
> you try it?

Hi,

I'm running this diff with "ipsec crypto no queue" and i'm seeing
traffic drops. This is little strange because "ipsec crypto no queue"
diff take care of those drops...



Re: forwarding in parallel ipsec workaround

2021-07-22 Thread Vitaliy Makkoveev
On Thu, Jul 22, 2021 at 08:38:04PM +0200, Hrvoje Popovski wrote:
> On 22.7.2021. 12:21, Hrvoje Popovski wrote:
> > Thank you for explanation..
> > 
> > after hitting box all night, box panic and i was able to reproduce it
> > this morning ... I'm not sure but box panic after hour or more of
> > sending traffic through iked tunnel ..
> > I will try to reproduce it through ipsec tunnel ..
> 
> 
> with isakmpd i've got panic after 5 or more hours
> 
> r620-1# uvm_fault(0x822b2158, 0x137, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at  tdb_free+0x9c:  movq%rsi,0(%rdi)
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>  469292  75215 680x10  02  isakmpd
> *142487  42745  0 0x14000 0x42000K softclock
> tdb_free(8118f6e0) at tdb_free+0x9c
> tdb_timeout(8118f6e0) at tdb_timeout+0x77
> softclock_thread(8000efc0) at softclock_thread+0x16e
> end trace frame: 0x0, count: 12
> https://www.openbsd.org/ddb.html describes the minimum info required in
> bug reports.  Insufficient info makes it difficult to find and fix bugs.
> 

There is the bluhm@'s diff with serialized `ipsec_in_use' access. Can
you try it?

Index: sys/net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.643
diff -u -p -r1.643 if.c
--- sys/net/if.c20 Jul 2021 16:32:28 -  1.643
+++ sys/net/if.c22 Jul 2021 20:46:19 -
@@ -109,6 +109,10 @@
 #include 
 #endif
 
+#ifdef IPSEC
+#include 
+#endif
+
 #ifdef MPLS
 #include 
 #endif
@@ -238,7 +242,7 @@ int ifq_congestion;
 
 int netisr;
 
-#defineNET_TASKQ   1
+#defineNET_TASKQ   4
 struct taskq   *nettqmp[NET_TASKQ];
 
 struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
@@ -811,10 +815,13 @@ if_output_local(struct ifnet *ifp, struc
return (ifiq_enqueue(ifiq, m) == 0 ? 0 : ENOBUFS);
 }
 
+struct rwlock ipsec_lock = RWLOCK_INITIALIZER("ipsecl");
+
 void
 if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
 {
struct mbuf *m;
+   int exclusive_lock = 0;
 
if (ml_empty(ml))
return;
@@ -835,15 +842,30 @@ if_input_process(struct ifnet *ifp, stru
 * lists and the socket layer.
 */
 
+   rw_enter_read(_lock);
+#ifdef IPSEC
/*
 * XXXSMP IPsec data structures are not ready to be accessed
 * by multiple network threads in parallel.  In this case
 * use an exclusive lock.
 */
-   NET_LOCK();
+   if (ipsec_in_use)
+   exclusive_lock = 1;
+#endif
+   if (exclusive_lock)
+   NET_LOCK();
+   else
+   NET_RLOCK_IN_SOFTNET();
+
+   rw_exit_read(_lock);
+
while ((m = ml_dequeue(ml)) != NULL)
(*ifp->if_input)(ifp, m);
-   NET_UNLOCK();
+
+   if (exclusive_lock)
+   NET_UNLOCK();
+   else
+   NET_RUNLOCK_IN_SOFTNET();
 }
 
 void
@@ -900,6 +922,12 @@ if_netisr(void *unused)
arpintr();
KERNEL_UNLOCK();
}
+#endif
+   if (n & (1 << NETISR_IP))
+   ipintr();
+#ifdef INET6
+   if (n & (1 << NETISR_IPV6))
+   ip6intr();
 #endif
 #if NPPP > 0
if (n & (1 << NETISR_PPP)) {
Index: sys/net/if_ethersubr.c
===
RCS file: /cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.275
diff -u -p -r1.275 if_ethersubr.c
--- sys/net/if_ethersubr.c  7 Jul 2021 20:19:01 -   1.275
+++ sys/net/if_ethersubr.c  22 Jul 2021 20:46:19 -
@@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct 
 
switch (af) {
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IP);
@@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IPV6);
@@ -271,13 +277,19 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
 

Re: forwarding in parallel ipsec workaround

2021-07-22 Thread Hrvoje Popovski
On 22.7.2021. 12:21, Hrvoje Popovski wrote:
> Thank you for explanation..
> 
> after hitting box all night, box panic and i was able to reproduce it
> this morning ... I'm not sure but box panic after hour or more of
> sending traffic through iked tunnel ..
> I will try to reproduce it through ipsec tunnel ..


with isakmpd i've got panic after 5 or more hours

r620-1# uvm_fault(0x822b2158, 0x137, 0, 2) -> e
kernel: page fault trap, code=0
Stopped at  tdb_free+0x9c:  movq%rsi,0(%rdi)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 469292  75215 680x10  02  isakmpd
*142487  42745  0 0x14000 0x42000K softclock
tdb_free(8118f6e0) at tdb_free+0x9c
tdb_timeout(8118f6e0) at tdb_timeout+0x77
softclock_thread(8000efc0) at softclock_thread+0x16e
end trace frame: 0x0, count: 12
https://www.openbsd.org/ddb.html describes the minimum info required in
bug reports.  Insufficient info makes it difficult to find and fix bugs.


ddb{0}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 75215  469292  76137 68  70x10isakmpd
 76137  414524  1  0  30x80  netio isakmpd
 96813  295173  1  0  30x100083  ttyin ksh
 19120  299871  1  0  30x100098  poll  cron
  1532   13552  38673 95  30x100092  kqreadsmtpd
 26046  463899  38673103  30x100092  kqreadsmtpd
 36828  266802  38673 95  30x100092  kqreadsmtpd
 70875  110170  38673 95  30x100092  kqreadsmtpd
 94519   66859  38673 95  30x100092  kqreadsmtpd
 82071  306235  38673 95  30x100092  kqreadsmtpd
 38673  196258  1  0  30x100080  kqreadsmtpd
 32975  338908  1  0  30x88  selectsshd
 83970  285872  1  0  30x100080  poll  ntpd
 13501  277269  28135 83  30x100092  poll  ntpd
 28135  154371  1 83  30x100092  poll  ntpd
 46277  200739  37910 73  30x100090  kqreadsyslogd
 37910  178152  1  0  30x100082  netio syslogd
 15169  243641  0  0  3 0x14200  bored smr
 72867   54384  0  0  3 0x14200  pgzerozerothread
 96747  464994  0  0  3 0x14200  aiodoned  aiodoned
 21420  278055  0  0  3 0x14200  syncerupdate
  4733   35757  0  0  3 0x14200  cleaner   cleaner
 65184  477103  0  0  3 0x14200  reaperreaper
 74689  248130  0  0  3 0x14200  pgdaemon  pagedaemon
 63567  184653  0  0  3 0x14200  bored crynlk
 25576  473224  0  0  3 0x14200  bored crypto
 59824  205269  0  0  3 0x14200  usbtskusbtask
 79880  326022  0  0  3 0x14200  usbatsk   usbatsk
 92636  507494  0  0  3  0x40014200  acpi0 acpi0
  1561  284019  0  0  7  0x40014200idle5
 71732  313587  0  0  7  0x40014200idle4
 50021  333287  0  0  7  0x40014200idle3
 87783  242752  0  0  3  0x40014200idle2
 48193   69868  0  0  7  0x40014200idle1
 80238  196627  0  0  3 0x14200  bored sensors
 36233   88931  0  0  3 0x14200  netlock   softnet
 76181  390400  0  0  3 0x14200  netlock   softnet
 12911  400662  0  0  3 0x14200  netlock   softnet
 23428   46363  0  0  3 0x14200  netlock   softnet
 22304  109508  0  0  3 0x14200  bored systqmp
 97244  473960  0  0  3 0x14200  bored systq
*42745  142487  0  0  7  0x40014200softclock
  7369  374311  0  0  3  0x40014200idle0
 1  360880  0  0  30x82  wait  init
 0   0 -1  0  3 0x10200  scheduler swapper
ddb{0}>


ddb{0}> trace /t 0t142487
alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
tdb_free(8118f6e0) at tdb_free+0x9c
tdb_timeout(8118f6e0) at tdb_timeout+0x77
softclock_thread(8000efc0) at softclock_thread+0x16e
end trace frame: 0x0, count: -4
ddb{0}>


ddb{0}> trace /t 0t469292
sleep_finish(800023906db0,1) at sleep_finish+0x11c
rw_enter(82187ca8,1) at rw_enter+0x1c2
solock(fd83b1019e10) at solock+0x4b
soo_poll(fd83b1989a68,1,8000238fb7a0) at soo_poll+0x34
selscan(8000238fb7a0,800023906fd0,800023906fdc,14,4,8000239071b
0) at selscan+0x14b
dopselect(8000238fb7a0,15,bf1473583e0,bf14736a140,0,8000239070e0) at
dopselect+0x487
sys_pselect(8000238fb7a0,800023907150,8000239071b0) at
sys_pselect+0xdb
syscall(800023907220) at syscall+0x3a9
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7d45b0, count: -9
ddb{0}>



Re: forwarding in parallel ipsec workaround

2021-07-22 Thread Vitaliy Makkoveev
On Thu, Jul 22, 2021 at 12:21:47PM +0200, Hrvoje Popovski wrote:
> On 22.7.2021. 0:39, Alexander Bluhm wrote:
> > On Thu, Jul 22, 2021 at 12:06:09AM +0200, Hrvoje Popovski wrote:
> >> I'm combining this and last parallel diff and i can't see any drops in
> >> traffic. Even sending at high rate, traffic through iked or isakmpd is
> >> stable at 150Kpps, which is good ..
> > 
> > Thanks, good news.
> > 
> >> One funny thing is that with top -SHs1 crypto CPU usage is always at 0.00%
> >> Could it be because of aes-ni?
> > 
> > I don't use the crypto thread anymore.  It is idle.  All crypto is
> > done in softnet between handling the network part of ipsec.
> > 
> > Here is the flame graph.  Crypto is running on top of network.
> > http://bluhm.genua.de/perform/results/current/patch-sys-ipsec-noqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10-btrace-kstack.0.svg
> > 
> > bluhm
> > 
> 
> Thank you for explanation..
> 
> after hitting box all night, box panic and i was able to reproduce it
> this morning ... I'm not sure but box panic after hour or more of
> sending traffic through iked tunnel ..
> I will try to reproduce it through ipsec tunnel ..

Unfortunately this unlocked `ipsec_in_use' check can't work as expected,
because concurrent pfkey thread could modify it between the check (1)
and netlock (2). You a lucky to hit this :)

 +   /*
 +* XXXSMP IPsec data structures are not ready to be
 +* accessed by multiple Network threads in parallel.
 +*/
 +   if (ipsec_in_use) /* (1) */
 +   exclusive_lock = 1;
 +   if (exclusive_lock)
 +   NET_LOCK(); /* (2) */
 +   else
 +   NET_RLOCK_IN_SOFTNET(); /* (2) */
 +
 +   while ((m = ml_dequeue(ml)) != NULL)
 + (*ifp->if_input)(ifp, m);

This could be fixed by introducing the another rwlock(9) which should
serialize `ipsec_in_use' modifications and check. It's obvious this lock
should be grabbed before netlock and we could keep this order, but I
strictly don't like this hack.

pfkeyv2_output(struct mbuf *mbuf, struct socket *so,
struct sockaddr *dstaddr, struct mbuf *control)
{
/* ... */

sounlock(so, SL_LOCKED);
rw_enter_write(_lock);
error = pfkeyv2_send(so, message, mbuf->m_pkthdr.len);
rw_exit_write(_lock);
solock(so);

/* ... */
}

if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
{
/* ... */

/*
 * XXXSMP IPsec data structures are not ready to be
 * accessed by multiple Network threads in parallel.
 */
rw_enter_read(_lock);

if (ipsec_in_use) /* (1) */
exclusive_lock = 1;
if (exclusive_lock)
NET_LOCK(); /* (2) */
else
NET_RLOCK_IN_SOFTNET(); /* (2) */

rw_exit_read(_lock);

while ((m = ml_dequeue(ml)) != NULL)
(*ifp->if_input)(ifp, m);

/* ... */
}

> 
> 
> 
> 
> r620-1# uvm_fault(0x821a82d8, 0x137, 0, 2) -> e
> kernel: page fault trap, code=0
> Stopped at  tdb_free+0x9c:  movq%rsi,0(%rdi)
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *310284  77290  0 0x14000  0x2001K softnet
> tdb_free(811832a8) at tdb_free+0x9c
> esp_output(fd80a5c2f300,811832a8,0,14,9) at esp_output+0x44f
> ipsp_process_packet(fd80a5c2f300,811832a8,2,0) at
> ipsp_process_packet+0x466
> ip_output_ipsec_send(811832a8,fd80a5c2f300,800023871cf8,1)
> at ip_output_ipsec_send+0x161
> ip_output(fd80a5c2f300,0,800023871cf8,1,0,0) at ip_output+0x8bd
> ip_forward(fd80a5c2f300,80087048,fd83b3454bd0,0) at
> ip_forward+0x26a
> ip_input_if(800023871e38,800023871e44,4,0,80087048) at
> ip_input_if+0x353
> ipv4_input(80087048,fd80a5c2f300) at ipv4_input+0x39
> if_input_process(80087048,800023871eb8) at if_input_process+0x92
> ifiq_process(80086c00) at ifiq_process+0x69
> taskq_thread(8002f200) at taskq_thread+0x81
> end trace frame: 0x0, count: 4
> https://www.openbsd.org/ddb.html describes the minimum info required in
> bug reports.  Insufficient info makes it difficult to find and fix bugs.
> 
> 
> ddb{1}> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>  94310  423919  1  0  30x100083  ttyin ksh
>  77465  202466  1  0  30x100098  poll  cron
>  65594  481620  35139 95  30x100092  kqreadsmtpd
>   2326   44364  35139103  30x100092  kqreadsmtpd
>   1225  215234  35139 95  30x100092  kqreadsmtpd
>  69590  400378  35139 95  30x100092  kqreadsmtpd
>   7267   26043  35139 95  30x100092  kqreadsmtpd
>  38120  331881  35139 95  30x100092  kqreadsmtpd
>  35139   66391  1  0  30x100080  kqreadsmtpd
>  79402   86611  1  0  30x88  selectsshd
>  43297  495028  53834 

Re: forwarding in parallel ipsec workaround

2021-07-22 Thread Hrvoje Popovski
On 22.7.2021. 0:39, Alexander Bluhm wrote:
> On Thu, Jul 22, 2021 at 12:06:09AM +0200, Hrvoje Popovski wrote:
>> I'm combining this and last parallel diff and i can't see any drops in
>> traffic. Even sending at high rate, traffic through iked or isakmpd is
>> stable at 150Kpps, which is good ..
> 
> Thanks, good news.
> 
>> One funny thing is that with top -SHs1 crypto CPU usage is always at 0.00%
>> Could it be because of aes-ni?
> 
> I don't use the crypto thread anymore.  It is idle.  All crypto is
> done in softnet between handling the network part of ipsec.
> 
> Here is the flame graph.  Crypto is running on top of network.
> http://bluhm.genua.de/perform/results/current/patch-sys-ipsec-noqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10-btrace-kstack.0.svg
> 
> bluhm
> 

Thank you for explanation..

after hitting box all night, box panic and i was able to reproduce it
this morning ... I'm not sure but box panic after hour or more of
sending traffic through iked tunnel ..
I will try to reproduce it through ipsec tunnel ..




r620-1# uvm_fault(0x821a82d8, 0x137, 0, 2) -> e
kernel: page fault trap, code=0
Stopped at  tdb_free+0x9c:  movq%rsi,0(%rdi)
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
*310284  77290  0 0x14000  0x2001K softnet
tdb_free(811832a8) at tdb_free+0x9c
esp_output(fd80a5c2f300,811832a8,0,14,9) at esp_output+0x44f
ipsp_process_packet(fd80a5c2f300,811832a8,2,0) at
ipsp_process_packet+0x466
ip_output_ipsec_send(811832a8,fd80a5c2f300,800023871cf8,1)
at ip_output_ipsec_send+0x161
ip_output(fd80a5c2f300,0,800023871cf8,1,0,0) at ip_output+0x8bd
ip_forward(fd80a5c2f300,80087048,fd83b3454bd0,0) at
ip_forward+0x26a
ip_input_if(800023871e38,800023871e44,4,0,80087048) at
ip_input_if+0x353
ipv4_input(80087048,fd80a5c2f300) at ipv4_input+0x39
if_input_process(80087048,800023871eb8) at if_input_process+0x92
ifiq_process(80086c00) at ifiq_process+0x69
taskq_thread(8002f200) at taskq_thread+0x81
end trace frame: 0x0, count: 4
https://www.openbsd.org/ddb.html describes the minimum info required in
bug reports.  Insufficient info makes it difficult to find and fix bugs.


ddb{1}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 94310  423919  1  0  30x100083  ttyin ksh
 77465  202466  1  0  30x100098  poll  cron
 65594  481620  35139 95  30x100092  kqreadsmtpd
  2326   44364  35139103  30x100092  kqreadsmtpd
  1225  215234  35139 95  30x100092  kqreadsmtpd
 69590  400378  35139 95  30x100092  kqreadsmtpd
  7267   26043  35139 95  30x100092  kqreadsmtpd
 38120  331881  35139 95  30x100092  kqreadsmtpd
 35139   66391  1  0  30x100080  kqreadsmtpd
 79402   86611  1  0  30x88  selectsshd
 43297  495028  53834101  30x100010  netlock   iked
 73854  239338  53834101  30x100090  kqreadiked
 79322  243271  53834101  30x100090  kqreadiked
 53834  169269  1  0  30x100080  kqreadiked
 98497  240479  1  0  30x100080  poll  ntpd
 93883  433035  46155 83  30x100092  poll  ntpd
 46155  241594  1 83  30x100092  poll  ntpd
 60135   75740  14081 73  30x100090  kqreadsyslogd
 14081  390346  1  0  30x100082  netio syslogd
 95646  201039  0  0  3 0x14200  bored smr
 54746  191914  0  0  3 0x14200  pgzerozerothread
 42039  120601  0  0  3 0x14200  aiodoned  aiodoned
 23202  404058  0  0  3 0x14200  syncerupdate
 32492  255945  0  0  3 0x14200  cleaner   cleaner
 63792  183728  0  0  3 0x14200  reaperreaper
 85251  251516  0  0  3 0x14200  pgdaemon  pagedaemon
 70650  399091  0  0  3 0x14200  bored crynlk
 93527  443884  0  0  3 0x14200  bored crypto
 98145   39889  0  0  3 0x14200  usbtskusbtask
 67334  434779  0  0  3 0x14200  usbatsk   usbatsk
 86238  520690  0  0  3  0x40014200  acpi0 acpi0
 52180  314140  0  0  7  0x40014200idle5
 23611  433449  0  0  7  0x40014200idle4
 40949  492122  0  0  7  0x40014200idle3
  1758  355637  0  0  7  0x40014200idle2
 24269   99037  0  0  3  0x40014200idle1
 26630  422993  0  0  3 0x14200  bored sensors
*77290  310284  0  0  7 0x14200softnet
 68084  469615  0  0  3 0x14200  bored softnet
 68120  339291  0  0  3 0x14200  bored 

Re: [External] : Re: forwarding in parallel ipsec workaround

2021-07-21 Thread Alexandr Nedvedicky
Hello,

my statement here is just for the record. we should have a follow up
discussion in a different thread, which is yet to be started (when time
will come).


> 
> > - Make ARP MP safe.  Currently we need the kernel lock there or
> >   it crashes.  This creates latency for all kind of packets.
> > - Convert the rwlock in pf to mutex.  I think your argument counts
> >   much more there.  But I cannot prove it.
> 
> My argument about what?  Didn't we all agree about converting the rwlock
> to a mutex for now?
> 

I disagree here. If we will trade current rw-lock, which protects
state table in pf(4), we will see significant drop in throughput.
great deal of packets is forwarded as readers on pf_state_lock.
only those, which don't match state are taking slow path. to turn
rw-lock to mutex, we must break single giant tree into smaller
buckets protected by mutexes. Or something like that must happen
first.

thanks and
regards
sashan



Re: forwarding in parallel ipsec workaround

2021-07-21 Thread Alexander Bluhm
On Thu, Jul 22, 2021 at 12:06:09AM +0200, Hrvoje Popovski wrote:
> I'm combining this and last parallel diff and i can't see any drops in
> traffic. Even sending at high rate, traffic through iked or isakmpd is
> stable at 150Kpps, which is good ..

Thanks, good news.

> One funny thing is that with top -SHs1 crypto CPU usage is always at 0.00%
> Could it be because of aes-ni?

I don't use the crypto thread anymore.  It is idle.  All crypto is
done in softnet between handling the network part of ipsec.

Here is the flame graph.  Crypto is running on top of network.
http://bluhm.genua.de/perform/results/current/patch-sys-ipsec-noqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10-btrace-kstack.0.svg

bluhm



Re: forwarding in parallel ipsec workaround

2021-07-21 Thread Hrvoje Popovski
On 21.7.2021. 22:21, Alexander Bluhm wrote:
> Ahh, to many diffs in my tree.  I have forgotten the cunk
> crp->crp_flags = ... | CRYPTO_F_NOQUEUE
> 
> Try this.  Still testing it myself, it looks a bit faster.


I'm combining this and last parallel diff and i can't see any drops in
traffic. Even sending at high rate, traffic through iked or isakmpd is
stable at 150Kpps, which is good ..

One funny thing is that with top -SHs1 crypto CPU usage is always at 0.00%
Could it be because of aes-ni?


r620-1# cat /etc/ipsec.conf
ike active esp from 192.168.232.0/24 to 192.168.123.0/24 \
local 192.168.42.1 peer 192.168.42.111 \
main auth hmac-sha1 enc aes group modp1024 \
quick enc aes-128-gcm group modp1024 \
psk "123"

r620-1# cat /etc/iked.conf
ikev2 active esp from 192.168.232.0/24 to 192.168.123.0/24 \
local 192.168.42.1 peer 192.168.42.111 \
ikesa enc aes-128-gcm group modp1024 prf hmac-sha1 \
childsa enc aes-128-gcm group modp1024 \
psk "123"



Re: forwarding in parallel ipsec workaround

2021-07-21 Thread Vitaliy Makkoveev
On Wed, Jul 21, 2021 at 06:41:30PM +0200, Alexander Bluhm wrote:
> On Mon, Jul 19, 2021 at 07:33:55PM +0300, Vitaliy Makkoveev wrote:
> > Hi, pipex(4) is also not ready for parallel access. In the chunk below
> > it will be accessed through (*ifp->if_input)() -> ether_input() ->
> > pipex_pppoe_input(). This looks not fatal but makes at least session
> > statistics inconsistent.
> 
> For pipex and pppoe we can put a kernel lock into ether_resolve().
> I also put locks in mpls.
> 
> I am not aware of any other issues.
> 
> Can we commit this now?
> 

Hmm, did I missed locks for mpls?

> bluhm
> 
> Index: net/if.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
> retrieving revision 1.643
> diff -u -p -r1.643 if.c
> --- net/if.c  20 Jul 2021 16:32:28 -  1.643
> +++ net/if.c  21 Jul 2021 16:07:15 -
> @@ -109,6 +109,10 @@
>  #include 
>  #endif
>  
> +#ifdef IPSEC
> +#include 
> +#endif
> +
>  #ifdef MPLS
>  #include 
>  #endif
> @@ -238,7 +242,7 @@ int   ifq_congestion;
>  
>  int   netisr;
>  
> -#define  NET_TASKQ   1
> +#define  NET_TASKQ   4
>  struct taskq *nettqmp[NET_TASKQ];
>  
>  struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
> @@ -815,6 +819,7 @@ void
>  if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
>  {
>   struct mbuf *m;
> + int exclusive_lock = 0;
>  
>   if (ml_empty(ml))
>   return;
> @@ -835,15 +840,27 @@ if_input_process(struct ifnet *ifp, stru
>* lists and the socket layer.
>*/
>  
> +#ifdef IPSEC
>   /*
>* XXXSMP IPsec data structures are not ready to be accessed
>* by multiple network threads in parallel.  In this case
>* use an exclusive lock.
>*/
> - NET_LOCK();
> + if (ipsec_in_use)
> + exclusive_lock = 1;
> +#endif
> + if (exclusive_lock)
> + NET_LOCK();
> + else
> + NET_RLOCK_IN_SOFTNET();
> +
>   while ((m = ml_dequeue(ml)) != NULL)
>   (*ifp->if_input)(ifp, m);
> - NET_UNLOCK();
> +
> + if (exclusive_lock)
> + NET_UNLOCK();
> + else
> + NET_RUNLOCK_IN_SOFTNET();
>  }
>  
>  void
> @@ -900,6 +917,12 @@ if_netisr(void *unused)
>   arpintr();
>   KERNEL_UNLOCK();
>   }
> +#endif
> + if (n & (1 << NETISR_IP))
> + ipintr();
> +#ifdef INET6
> + if (n & (1 << NETISR_IPV6))
> + ip6intr();
>  #endif
>  #if NPPP > 0
>   if (n & (1 << NETISR_PPP)) {
> Index: net/if_ethersubr.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v
> retrieving revision 1.275
> diff -u -p -r1.275 if_ethersubr.c
> --- net/if_ethersubr.c7 Jul 2021 20:19:01 -   1.275
> +++ net/if_ethersubr.c21 Jul 2021 16:34:51 -
> @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct 
>  
>   switch (af) {
>   case AF_INET:
> + KERNEL_LOCK();
> + /* XXXSMP there is a MP race in arpresolve() */
>   error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
> + KERNEL_UNLOCK();
>   if (error)
>   return (error);
>   eh->ether_type = htons(ETHERTYPE_IP);
> @@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct 
>   break;
>  #ifdef INET6
>   case AF_INET6:
> + KERNEL_LOCK();
> + /* XXXSMP there is a MP race in nd6_resolve() */
>   error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
> + KERNEL_UNLOCK();
>   if (error)
>   return (error);
>   eh->ether_type = htons(ETHERTYPE_IPV6);
> @@ -271,13 +277,19 @@ ether_resolve(struct ifnet *ifp, struct 
>   break;
>  #ifdef INET6
>   case AF_INET6:
> + KERNEL_LOCK();
> + /* XXXSMP there is a MP race in nd6_resolve() */
>   error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
> + KERNEL_UNLOCK();
>   if (error)
>   return (error);
>   break;
>  #endif
>   case AF_INET:
> + KERNEL_LOCK();
> + /* XXXSMP there is a MP race in arpresolve() */
>   error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
> + KERNEL_UNLOCK();
>   if (error)
>   return (error);
>   break;
> @@ -528,12 +540,14 @@ ether_input(struct ifnet *ifp, struct mb
>   case ETHERTYPE_PPPOE:
>   if (m->m_flags & (M_MCAST | M_BCAST))
>   goto dropanyway;
> + KERNEL_LOCK();
>  

Re: forwarding in parallel ipsec workaround

2021-07-21 Thread Alexander Bluhm
On Wed, Jul 21, 2021 at 07:53:55PM +0200, Hrvoje Popovski wrote:
> i've applied this and ipsec crypto no queue diff and i'm getting
> splasserts below ... maybe it's something obvious, if not, i will try
> diff by diff ..

Ahh, to many diffs in my tree.  I have forgotten the cunk
crp->crp_flags = ... | CRYPTO_F_NOQUEUE

Try this.  Still testing it myself, it looks a bit faster.

bluhm

Index: crypto/crypto.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/crypto/crypto.c,v
retrieving revision 1.84
diff -u -p -r1.84 crypto.c
--- crypto/crypto.c 21 Jul 2021 11:11:41 -  1.84
+++ crypto/crypto.c 21 Jul 2021 19:51:14 -
@@ -387,23 +387,32 @@ crypto_unregister(u_int32_t driverid, in
 int
 crypto_dispatch(struct cryptop *crp)
 {
-   struct taskq *tq = crypto_taskq;
-   int error = 0, s;
+   int error = 0, lock = 1, s;
u_int32_t hid;
 
s = splvm();
hid = (crp->crp_sid >> 32) & 0x;
if (hid < crypto_drivers_num) {
if (crypto_drivers[hid].cc_flags & CRYPTOCAP_F_MPSAFE)
-   tq = crypto_taskq_mpsafe;
+   lock = 0;
}
splx(s);
 
-   if ((crp->crp_flags & CRYPTO_F_NOQUEUE) == 0) {
+   /* XXXSMP crypto_invoke() is not MP safe */
+   lock = 1;
+
+   if (crp->crp_flags & CRYPTO_F_NOQUEUE) {
+   if (lock)
+   KERNEL_LOCK();
+   error = crypto_invoke(crp);
+   if (lock)
+   KERNEL_UNLOCK();
+   } else {
+   struct taskq *tq;
+
+   tq = lock ? crypto_taskq : crypto_taskq_mpsafe;
task_set(>crp_task, (void (*))crypto_invoke, crp);
task_add(tq, >crp_task);
-   } else {
-   error = crypto_invoke(crp);
}
 
return error;
@@ -424,6 +433,8 @@ crypto_invoke(struct cryptop *crp)
if (crp == NULL || crp->crp_callback == NULL)
return EINVAL;
 
+   KERNEL_ASSERT_LOCKED();
+
s = splvm();
if (crp->crp_ndesc < 1 || crypto_drivers == NULL) {
crp->crp_etype = EINVAL;
@@ -535,11 +546,16 @@ void
 crypto_done(struct cryptop *crp)
 {
crp->crp_flags |= CRYPTO_F_DONE;
+
if (crp->crp_flags & CRYPTO_F_NOQUEUE) {
/* not from the crypto queue, wakeup the userland process */
crp->crp_callback(crp);
} else {
+   struct taskq *tq;
+
+   tq = (crp->crp_flags & CRYPTO_F_MPSAFE) ?
+   crypto_taskq_mpsafe : crypto_taskq;
task_set(>crp_task, (void (*))crp->crp_callback, crp);
-   task_add(crypto_taskq, >crp_task);
+   task_add(tq, >crp_task);
}
 }
Index: crypto/cryptodev.h
===
RCS file: /data/mirror/openbsd/cvs/src/sys/crypto/cryptodev.h,v
retrieving revision 1.73
diff -u -p -r1.73 cryptodev.h
--- crypto/cryptodev.h  9 Jul 2021 20:43:28 -   1.73
+++ crypto/cryptodev.h  21 Jul 2021 19:48:44 -
@@ -171,6 +171,7 @@ struct cryptop {
 
 #define CRYPTO_F_IMBUF 0x0001  /* Input/output are mbuf chains, otherwise 
contig */
 #define CRYPTO_F_IOV   0x0002  /* Input/output are uio */
+#define CRYPTO_F_MPSAFE0x0004  /* Do not use kernel lock for callback 
*/
 #define CRYPTO_F_NOQUEUE   0x0008  /* Don't use crypto queue/thread */
 #define CRYPTO_F_DONE  0x0010  /* request completed */
 
Index: netinet/ip_ah.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ah.c,v
retrieving revision 1.151
diff -u -p -r1.151 ip_ah.c
--- netinet/ip_ah.c 18 Jul 2021 14:38:20 -  1.151
+++ netinet/ip_ah.c 21 Jul 2021 19:51:14 -
@@ -684,7 +684,7 @@ ah_input(struct mbuf *m, struct tdb *tdb
 
/* Crypto operation descriptor. */
crp->crp_ilen = m->m_pkthdr.len; /* Total input length. */
-   crp->crp_flags = CRYPTO_F_IMBUF;
+   crp->crp_flags = CRYPTO_F_IMBUF | CRYPTO_F_MPSAFE | CRYPTO_F_NOQUEUE;
crp->crp_buf = (caddr_t)m;
crp->crp_callback = ipsec_input_cb;
crp->crp_sid = tdb->tdb_cryptoid;
@@ -699,9 +699,7 @@ ah_input(struct mbuf *m, struct tdb *tdb
memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
tc->tc_rpl = tdb->tdb_rpl;
 
-   KERNEL_LOCK();
error = crypto_dispatch(crp);
-   KERNEL_UNLOCK();
return error;
 
  drop:
@@ -1134,7 +1132,7 @@ ah_output(struct mbuf *m, struct tdb *td
 
/* Crypto operation descriptor. */
crp->crp_ilen = m->m_pkthdr.len; /* Total input length. */
-   crp->crp_flags = CRYPTO_F_IMBUF;
+   crp->crp_flags = CRYPTO_F_IMBUF | CRYPTO_F_MPSAFE | CRYPTO_F_NOQUEUE;
crp->crp_buf = (caddr_t)m;
crp->crp_callback = ipsec_output_cb;
crp->crp_sid = tdb->tdb_cryptoid;
@@ -1148,9 +1146,7 @@ 

Re: forwarding in parallel ipsec workaround

2021-07-21 Thread Hrvoje Popovski
On 21.7.2021. 18:41, Alexander Bluhm wrote:
> On Mon, Jul 19, 2021 at 07:33:55PM +0300, Vitaliy Makkoveev wrote:
>> Hi, pipex(4) is also not ready for parallel access. In the chunk below
>> it will be accessed through (*ifp->if_input)() -> ether_input() ->
>> pipex_pppoe_input(). This looks not fatal but makes at least session
>> statistics inconsistent.
> For pipex and pppoe we can put a kernel lock into ether_resolve().
> I also put locks in mpls.
> 
> I am not aware of any other issues.

Hi,

i've applied this and ipsec crypto no queue diff and i'm getting
splasserts below ... maybe it's something obvious, if not, i will try
diff by diff ..

I'm running iked tunnel ...

log on box where traffic goes into tunnel

splassert: ipsp_process_done: want 2 have 0
Starting stack trace...
ipsp_process_done(fd80a8647400,811832a0) at
ipsp_process_done+0x59
esp_output_cb(811832a0,81194e00,fd80a8647400,21c,0)
at esp_output_cb+0x41
ipsec_output_cb(fd8397dad440) at ipsec_output_cb+0x177
taskq_thread(8013f480) at taskq_thread+0x81
end trace frame: 0x0, count: 253
End of stack trace.


splassert: ip_output: want 2 have 0
Starting stack trace...
ip_output(fd80a8647400,0,0,2,0,0) at ip_output+0x88
ipsp_process_done(fd80a8647400,811832a0) at
ipsp_process_done+0x384
esp_output_cb(811832a0,81194e00,fd80a8647400,21c,0)
at esp_o

 utput_cb+0x41
ipsec_output_cb(fd8397dad440) at ipsec_output_cb+0x177
taskq_thread(8013f480) at taskq_thread+0x81
end trace frame: 0x0, count: 252
End of stack trace.


logs on box where traffic goes from tunnel out
splassert: tdb_hash: want 2 have 0
Starting stack trace...
tdb_hash(36b50c1c,81199200,32) at tdb_hash+0x5c
gettdb_dir(0,36b50c1c,81199200,32,0) at gettdb_dir+0x83
ipsec_input_cb(fd839e0c6ca0) at ipsec_input_cb+0x7a
taskq_thread(80140d80) at taskq_thread+0x81
end trace frame: 0x0, count: 253
End of stack trace.

splassert: esp_input_cb: want 2 have 0
Starting stack trace...
esp_input_cb(8118a5f8,81199200,fd80026f5500,0) at
esp_input_cb+0x73
ipsec_input_cb(fd839e0c6ca0) at ipsec_input_cb+0x162
taskq_thread(80140d80) at taskq_thread+0x81
end trace frame: 0x0, count: 254
End of stack trace.

splassert: enc_getif: want 2 have 0
Starting stack trace...
enc_getif(0,0) at enc_getif+0x54
ipsec_common_input_cb(fd80026f5500,8118a5f8,14,9) at
ipsec_common_input_cb+0x934
esp_input_cb(8118a5f8,81199200,fd80026f5500,0) at
esp_input_cb+0x456
ipsec_input_cb(fd839e0c6ca0) at ipsec_input_cb+0x162
taskq_thread(80140d80) at taskq_thread+0x81
end trace frame: 0x0, count: 252
End of stack trace.



Re: forwarding in parallel ipsec workaround

2021-07-21 Thread Alexander Bluhm
On Mon, Jul 19, 2021 at 07:33:55PM +0300, Vitaliy Makkoveev wrote:
> Hi, pipex(4) is also not ready for parallel access. In the chunk below
> it will be accessed through (*ifp->if_input)() -> ether_input() ->
> pipex_pppoe_input(). This looks not fatal but makes at least session
> statistics inconsistent.

For pipex and pppoe we can put a kernel lock into ether_resolve().
I also put locks in mpls.

I am not aware of any other issues.

Can we commit this now?

bluhm

Index: net/if.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
retrieving revision 1.643
diff -u -p -r1.643 if.c
--- net/if.c20 Jul 2021 16:32:28 -  1.643
+++ net/if.c21 Jul 2021 16:07:15 -
@@ -109,6 +109,10 @@
 #include 
 #endif
 
+#ifdef IPSEC
+#include 
+#endif
+
 #ifdef MPLS
 #include 
 #endif
@@ -238,7 +242,7 @@ int ifq_congestion;
 
 int netisr;
 
-#defineNET_TASKQ   1
+#defineNET_TASKQ   4
 struct taskq   *nettqmp[NET_TASKQ];
 
 struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
@@ -815,6 +819,7 @@ void
 if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
 {
struct mbuf *m;
+   int exclusive_lock = 0;
 
if (ml_empty(ml))
return;
@@ -835,15 +840,27 @@ if_input_process(struct ifnet *ifp, stru
 * lists and the socket layer.
 */
 
+#ifdef IPSEC
/*
 * XXXSMP IPsec data structures are not ready to be accessed
 * by multiple network threads in parallel.  In this case
 * use an exclusive lock.
 */
-   NET_LOCK();
+   if (ipsec_in_use)
+   exclusive_lock = 1;
+#endif
+   if (exclusive_lock)
+   NET_LOCK();
+   else
+   NET_RLOCK_IN_SOFTNET();
+
while ((m = ml_dequeue(ml)) != NULL)
(*ifp->if_input)(ifp, m);
-   NET_UNLOCK();
+
+   if (exclusive_lock)
+   NET_UNLOCK();
+   else
+   NET_RUNLOCK_IN_SOFTNET();
 }
 
 void
@@ -900,6 +917,12 @@ if_netisr(void *unused)
arpintr();
KERNEL_UNLOCK();
}
+#endif
+   if (n & (1 << NETISR_IP))
+   ipintr();
+#ifdef INET6
+   if (n & (1 << NETISR_IPV6))
+   ip6intr();
 #endif
 #if NPPP > 0
if (n & (1 << NETISR_PPP)) {
Index: net/if_ethersubr.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v
retrieving revision 1.275
diff -u -p -r1.275 if_ethersubr.c
--- net/if_ethersubr.c  7 Jul 2021 20:19:01 -   1.275
+++ net/if_ethersubr.c  21 Jul 2021 16:34:51 -
@@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct 
 
switch (af) {
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IP);
@@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
eh->ether_type = htons(ETHERTYPE_IPV6);
@@ -271,13 +277,19 @@ ether_resolve(struct ifnet *ifp, struct 
break;
 #ifdef INET6
case AF_INET6:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in nd6_resolve() */
error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
break;
 #endif
case AF_INET:
+   KERNEL_LOCK();
+   /* XXXSMP there is a MP race in arpresolve() */
error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
+   KERNEL_UNLOCK();
if (error)
return (error);
break;
@@ -528,12 +540,14 @@ ether_input(struct ifnet *ifp, struct mb
case ETHERTYPE_PPPOE:
if (m->m_flags & (M_MCAST | M_BCAST))
goto dropanyway;
+   KERNEL_LOCK();
 #ifdef PIPEX
if (pipex_enable) {
struct pipex_session *session;
 
if ((session = pipex_pppoe_lookup_session(m)) != NULL) {
pipex_pppoe_input(m, session);
+  

Re: forwarding in parallel ipsec workaround

2021-07-21 Thread Martin Pieuchot
On 20/07/21(Tue) 15:46, Alexander Bluhm wrote:
> On Tue, Jul 20, 2021 at 02:26:02PM +0200, Alexander Bluhm wrote:
> > > Note that having multiple threads competing for an exclusive rwlock will
> > > generate unnecessary wakeup/sleep cycles every time the lock is released.
> > > It is valuable to keep this in mind as it might add extra latency when
> > > processing packets.
> > 
> > Of course.  What do you recommend?
> 
> We may have another alternative.
> 
> - Always use a shared net lock but also aquire kernel lock.

Using an exclusive NET_LOCK() seems better, that's what we have now.

My point is just that if we're using an exclusive NET_LOCK() we lose the
gain of having multiple softnet threads, so if we could reduce the
number of threads so would be better.  That said, if it isn't trivial to
do so, I'd better spend the time into making IPsec works with parallel
threads.

Another reason for not using an exclusive lock in the softnet thread is
to be able to execute read ioctls at the same time as the forwarding
path.  This has been reverted due to a bug elsewhere last year and never
got activated again.  But somebody might want to revisit this, 'cause
doing ifconfig(8) on a busy machine was hanging for a very long time.



Re: forwarding in parallel ipsec workaround

2021-07-21 Thread Martin Pieuchot
On 20/07/21(Tue) 14:26, Alexander Bluhm wrote:
> On Tue, Jul 20, 2021 at 10:08:09AM +0200, Martin Pieuchot wrote:
> > On 19/07/21(Mon) 17:53, Alexander Bluhm wrote:
> > > Hi,
> > > 
> > > I found why the IPsec workaround did not work.
> > > 
> > > At init time we set ifiq->ifiq_softnet = net_tq(ifp->if_index +
> > > idx), but the workaround modifies net_tq() at runtime.  Modifying
> > > net_tq() at runtime is bad anyway as task_add() and task_del() could
> > > be called with different task queues.
> > > 
> > > So better use exclusive lock if IPsec is in use.  For me this is
> > > running stable.
> > 
> > Note that having multiple threads competing for an exclusive rwlock will
> > generate unnecessary wakeup/sleep cycles every time the lock is released.  
> > It is valuable to keep this in mind as it might add extra latency when
> > processing packets.
> 
> Of course.  What do you recommend?

What you find easier to make progress with :)

> - Develop outside of the tree until all problems are fixed.
> - Delay work on parallel forwarding until IPsec is MP safe.
> - Accept a possible slowdown of IPsec.  In my measurements it gets
>   faster even with the exclusive lock.
> - Concentrate on making IPsec faster.  By removing the crypto
>   queues you gain much more performance than the exclusive lock may
>   cost.  Did you see the massive kernel locks in my graph?
>   
> http://bluhm.genua.de/perform/results/latest/patch-sys-ip-multiqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10_-R-btrace-kstack.0.svg

I didn't see this, nice to know.

> - Make ARP MP safe.  Currently we need the kernel lock there or
>   it crashes.  This creates latency for all kind of packets.
> - Convert the rwlock in pf to mutex.  I think your argument counts
>   much more there.  But I cannot prove it.

My argument about what?  Didn't we all agree about converting the rwlock
to a mutex for now?

> My plan is to commit what we have and improve where most pain is.
> This makes incremental steps easier.

Makes sense.



Re: forwarding in parallel ipsec workaround

2021-07-20 Thread Alexander Bluhm
On Tue, Jul 20, 2021 at 02:26:02PM +0200, Alexander Bluhm wrote:
> > Note that having multiple threads competing for an exclusive rwlock will
> > generate unnecessary wakeup/sleep cycles every time the lock is released.
> > It is valuable to keep this in mind as it might add extra latency when
> > processing packets.
> 
> Of course.  What do you recommend?

We may have another alternative.

- Always use a shared net lock but also aquire kernel lock.

This seems to be a quick fix for all the MP problmes when running
in parallel.  Then the kernel lock can be removed step by step.

I will create a diff an test.

bluhm

> - Develop outside of the tree until all problems are fixed.
> - Delay work on parallel forwarding until IPsec is MP safe.
> - Accept a possible slowdown of IPsec.  In my measurements it gets
>   faster even with the exclusive lock.
> - Concentrate on making IPsec faster.  By removing the crypto
>   queues you gain much more performance than the exclusive lock may
>   cost.  Did you see the massive kernel locks in my graph?
>   
> http://bluhm.genua.de/perform/results/latest/patch-sys-ip-multiqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10_-R-btrace-kstack.0.svg
> - Make ARP MP safe.  Currently we need the kernel lock there or
>   it crashes.  This creates latency for all kind of packets.
> - Convert the rwlock in pf to mutex.  I think your argument counts
>   much more there.  But I cannot prove it.
> 
> My plan is to commit what we have and improve where most pain is.
> This makes incremental steps easier.
> 
> bluhm



Re: forwarding in parallel ipsec workaround

2021-07-20 Thread Alexander Bluhm
On Tue, Jul 20, 2021 at 10:08:09AM +0200, Martin Pieuchot wrote:
> On 19/07/21(Mon) 17:53, Alexander Bluhm wrote:
> > Hi,
> > 
> > I found why the IPsec workaround did not work.
> > 
> > At init time we set ifiq->ifiq_softnet = net_tq(ifp->if_index +
> > idx), but the workaround modifies net_tq() at runtime.  Modifying
> > net_tq() at runtime is bad anyway as task_add() and task_del() could
> > be called with different task queues.
> > 
> > So better use exclusive lock if IPsec is in use.  For me this is
> > running stable.
> 
> Note that having multiple threads competing for an exclusive rwlock will
> generate unnecessary wakeup/sleep cycles every time the lock is released.  
> It is valuable to keep this in mind as it might add extra latency when
> processing packets.

Of course.  What do you recommend?

- Develop outside of the tree until all problems are fixed.
- Delay work on parallel forwarding until IPsec is MP safe.
- Accept a possible slowdown of IPsec.  In my measurements it gets
  faster even with the exclusive lock.
- Concentrate on making IPsec faster.  By removing the crypto
  queues you gain much more performance than the exclusive lock may
  cost.  Did you see the massive kernel locks in my graph?
  
http://bluhm.genua.de/perform/results/latest/patch-sys-ip-multiqueue.1/btrace/ssh_perform%40lt13_iperf3_-c10.4.56.36_-P10_-t10_-R-btrace-kstack.0.svg
- Make ARP MP safe.  Currently we need the kernel lock there or
  it crashes.  This creates latency for all kind of packets.
- Convert the rwlock in pf to mutex.  I think your argument counts
  much more there.  But I cannot prove it.

My plan is to commit what we have and improve where most pain is.
This makes incremental steps easier.

bluhm

> > Index: net/if.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
> > retrieving revision 1.642
> > diff -u -p -r1.642 if.c
> > --- net/if.c30 Jun 2021 13:23:33 -  1.642
> > +++ net/if.c19 Jul 2021 14:51:31 -
> > @@ -109,6 +109,10 @@
> >  #include 
> >  #endif
> >  
> > +#ifdef IPSEC
> > +#include 
> > +#endif
> > +
> >  #ifdef MPLS
> >  #include 
> >  #endif
> > @@ -238,7 +242,7 @@ int ifq_congestion;
> >  
> >  int netisr;
> >  
> > -#defineNET_TASKQ   1
> > +#defineNET_TASKQ   4
> >  struct taskq   *nettqmp[NET_TASKQ];
> >  
> >  struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
> > @@ -815,6 +819,7 @@ void
> >  if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
> >  {
> > struct mbuf *m;
> > +   int exclusive_lock = 0;
> >  
> > if (ml_empty(ml))
> > return;
> > @@ -834,10 +839,25 @@ if_input_process(struct ifnet *ifp, stru
> >  * to PF globals, pipex globals, unicast and multicast addresses
> >  * lists and the socket layer.
> >  */
> > -   NET_LOCK();
> > +
> > +   /*
> > +* XXXSMP IPsec data structures are not ready to be
> > +* accessed by multiple Network threads in parallel.
> > +*/
> > +   if (ipsec_in_use)
> > +   exclusive_lock = 1;
> > +   if (exclusive_lock)
> > +   NET_LOCK();
> > +   else
> > +   NET_RLOCK_IN_SOFTNET();
> > +
> > while ((m = ml_dequeue(ml)) != NULL)
> > (*ifp->if_input)(ifp, m);
> > -   NET_UNLOCK();
> > +
> > +   if (exclusive_lock)
> > +   NET_UNLOCK();
> > +   else
> > +   NET_RUNLOCK_IN_SOFTNET();
> >  }
> >  
> >  void
> > @@ -895,6 +915,12 @@ if_netisr(void *unused)
> > KERNEL_UNLOCK();
> > }
> >  #endif
> > +   if (n & (1 << NETISR_IP))
> > +   ipintr();
> > +#ifdef INET6
> > +   if (n & (1 << NETISR_IPV6))
> > +   ip6intr();
> > +#endif
> >  #if NPPP > 0
> > if (n & (1 << NETISR_PPP)) {
> > KERNEL_LOCK();
> > @@ -3311,17 +3337,14 @@ unhandled_af(int af)
> > panic("unhandled af %d", af);
> >  }
> >  
> > -/*
> > - * XXXSMP This tunable is here to work around the fact that IPsec
> > - * globals aren't ready to be accessed by multiple threads in
> > - * parallel.
> > - */
> > -int nettaskqs = NET_TASKQ;
> > -
> >  struct taskq *
> >  net_tq(unsigned int ifindex)
> >  {
> > struct taskq *t = NULL;
> > +   static int nettaskqs;
> > +
> > +   if (nettaskqs == 0)
> > +   nettaskqs = min(NET_TASKQ, ncpus);
> >  
> > t = nettqmp[ifindex % nettaskqs];
> >  
> > Index: net/if_ethersubr.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v
> > retrieving revision 1.275
> > diff -u -p -r1.275 if_ethersubr.c
> > --- net/if_ethersubr.c  7 Jul 2021 20:19:01 -   1.275
> > +++ net/if_ethersubr.c  19 Jul 2021 14:32:48 -
> > @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct 
> >  
> > switch (af) {
> > case AF_INET:
> > +   

Re: forwarding in parallel ipsec workaround

2021-07-20 Thread Martin Pieuchot
On 19/07/21(Mon) 17:53, Alexander Bluhm wrote:
> Hi,
> 
> I found why the IPsec workaround did not work.
> 
> At init time we set ifiq->ifiq_softnet = net_tq(ifp->if_index +
> idx), but the workaround modifies net_tq() at runtime.  Modifying
> net_tq() at runtime is bad anyway as task_add() and task_del() could
> be called with different task queues.
> 
> So better use exclusive lock if IPsec is in use.  For me this is
> running stable.

Note that having multiple threads competing for an exclusive rwlock will
generate unnecessary wakeup/sleep cycles every time the lock is released.  
It is valuable to keep this in mind as it might add extra latency when
processing packets.

> Index: net/if.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
> retrieving revision 1.642
> diff -u -p -r1.642 if.c
> --- net/if.c  30 Jun 2021 13:23:33 -  1.642
> +++ net/if.c  19 Jul 2021 14:51:31 -
> @@ -109,6 +109,10 @@
>  #include 
>  #endif
>  
> +#ifdef IPSEC
> +#include 
> +#endif
> +
>  #ifdef MPLS
>  #include 
>  #endif
> @@ -238,7 +242,7 @@ int   ifq_congestion;
>  
>  int   netisr;
>  
> -#define  NET_TASKQ   1
> +#define  NET_TASKQ   4
>  struct taskq *nettqmp[NET_TASKQ];
>  
>  struct task if_input_task_locked = TASK_INITIALIZER(if_netisr, NULL);
> @@ -815,6 +819,7 @@ void
>  if_input_process(struct ifnet *ifp, struct mbuf_list *ml)
>  {
>   struct mbuf *m;
> + int exclusive_lock = 0;
>  
>   if (ml_empty(ml))
>   return;
> @@ -834,10 +839,25 @@ if_input_process(struct ifnet *ifp, stru
>* to PF globals, pipex globals, unicast and multicast addresses
>* lists and the socket layer.
>*/
> - NET_LOCK();
> +
> + /*
> +  * XXXSMP IPsec data structures are not ready to be
> +  * accessed by multiple Network threads in parallel.
> +  */
> + if (ipsec_in_use)
> + exclusive_lock = 1;
> + if (exclusive_lock)
> + NET_LOCK();
> + else
> + NET_RLOCK_IN_SOFTNET();
> +
>   while ((m = ml_dequeue(ml)) != NULL)
>   (*ifp->if_input)(ifp, m);
> - NET_UNLOCK();
> +
> + if (exclusive_lock)
> + NET_UNLOCK();
> + else
> + NET_RUNLOCK_IN_SOFTNET();
>  }
>  
>  void
> @@ -895,6 +915,12 @@ if_netisr(void *unused)
>   KERNEL_UNLOCK();
>   }
>  #endif
> + if (n & (1 << NETISR_IP))
> + ipintr();
> +#ifdef INET6
> + if (n & (1 << NETISR_IPV6))
> + ip6intr();
> +#endif
>  #if NPPP > 0
>   if (n & (1 << NETISR_PPP)) {
>   KERNEL_LOCK();
> @@ -3311,17 +3337,14 @@ unhandled_af(int af)
>   panic("unhandled af %d", af);
>  }
>  
> -/*
> - * XXXSMP This tunable is here to work around the fact that IPsec
> - * globals aren't ready to be accessed by multiple threads in
> - * parallel.
> - */
> -int   nettaskqs = NET_TASKQ;
> -
>  struct taskq *
>  net_tq(unsigned int ifindex)
>  {
>   struct taskq *t = NULL;
> + static int nettaskqs;
> +
> + if (nettaskqs == 0)
> + nettaskqs = min(NET_TASKQ, ncpus);
>  
>   t = nettqmp[ifindex % nettaskqs];
>  
> Index: net/if_ethersubr.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if_ethersubr.c,v
> retrieving revision 1.275
> diff -u -p -r1.275 if_ethersubr.c
> --- net/if_ethersubr.c7 Jul 2021 20:19:01 -   1.275
> +++ net/if_ethersubr.c19 Jul 2021 14:32:48 -
> @@ -222,7 +222,10 @@ ether_resolve(struct ifnet *ifp, struct 
>  
>   switch (af) {
>   case AF_INET:
> + KERNEL_LOCK();
> + /* XXXSMP there is a MP race in arpresolve() */
>   error = arpresolve(ifp, rt, m, dst, eh->ether_dhost);
> + KERNEL_UNLOCK();
>   if (error)
>   return (error);
>   eh->ether_type = htons(ETHERTYPE_IP);
> @@ -245,7 +248,10 @@ ether_resolve(struct ifnet *ifp, struct 
>   break;
>  #ifdef INET6
>   case AF_INET6:
> + KERNEL_LOCK();
> + /* XXXSMP there is a MP race in nd6_resolve() */
>   error = nd6_resolve(ifp, rt, m, dst, eh->ether_dhost);
> + KERNEL_UNLOCK();
>   if (error)
>   return (error);
>   eh->ether_type = htons(ETHERTYPE_IPV6);
> Index: net/ifq.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.c,v
> retrieving revision 1.44
> diff -u -p -r1.44 ifq.c
> --- net/ifq.c 9 Jul 2021 01:22:05 -   1.44
> +++ net/ifq.c 19 Jul 2021 14:32:48 -
> @@ -243,7 +243,7 @@ void
>  ifq_init(struct ifqueue *ifq, struct ifnet *ifp, unsigned int idx)
>  {
>   ifq->ifq_if = ifp;
> - ifq->ifq_softnet = 

Re: forwarding in parallel ipsec workaround

2021-07-19 Thread Hrvoje Popovski
On 19.7.2021. 17:53, Alexander Bluhm wrote:
> Hi,
> 
> I found why the IPsec workaround did not work.
> 
> At init time we set ifiq->ifiq_softnet = net_tq(ifp->if_index +
> idx), but the workaround modifies net_tq() at runtime.  Modifying
> net_tq() at runtime is bad anyway as task_add() and task_del() could
> be called with different task queues.
> 
> So better use exclusive lock if IPsec is in use.  For me this is
> running stable.


Hi,

i can't trigger panic with this diff. I've tried with isakmpd and with
iked ...
With this diff traffic through tunnel seems little slower ...



Re: forwarding in parallel ipsec workaround

2021-07-19 Thread Vitaliy Makkoveev
On Mon, Jul 19, 2021 at 05:53:40PM +0200, Alexander Bluhm wrote:
> Hi,
> 
> I found why the IPsec workaround did not work.
> 
> At init time we set ifiq->ifiq_softnet = net_tq(ifp->if_index +
> idx), but the workaround modifies net_tq() at runtime.  Modifying
> net_tq() at runtime is bad anyway as task_add() and task_del() could
> be called with different task queues.
> 
> So better use exclusive lock if IPsec is in use.  For me this is
> running stable.
> 
> bluhm
> 

Hi, pipex(4) is also not ready for parallel access. In the chunk below
it will be accessed through (*ifp->if_input)() -> ether_input() ->
pipex_pppoe_input(). This looks not fatal but makes at least session
statistics inconsistent.

> @@ -834,10 +839,25 @@ if_input_process(struct ifnet *ifp, stru
>* to PF globals, pipex globals, unicast and multicast addresses
>* lists and the socket layer.
>*/
> - NET_LOCK();
> +
> + /*
> +  * XXXSMP IPsec data structures are not ready to be
> +  * accessed by multiple Network threads in parallel.
> +  */
> + if (ipsec_in_use)
---^

So I like to check at least `pipe_enable' here. Or introduce sessions
counter `pipex_in_use' for better performance.

> + exclusive_lock = 1;
> + if (exclusive_lock)
> + NET_LOCK();
> + else
> + NET_RLOCK_IN_SOFTNET();
> +
>   while ((m = ml_dequeue(ml)) != NULL)
>   (*ifp->if_input)(ifp, m);
> - NET_UNLOCK();