On 21.4.2021. 23:28, Hrvoje Popovski wrote:
> On 21.4.2021. 21:36, Alexander Bluhm wrote:
>> Hi,
>>
>> For a while we are running network without kernel lock, but with a
>> network lock. The latter is an exclusive sleeping rwlock.
>>
>> It is possible to run the forwarding path in parallel on multiple
>> cores. I use ix(4) interfaces which provide one input queue for
>> each CPU. For that we have to start multiple softnet tasks and
>> replace the exclusive lock with a shared lock. This works for IP
>> and IPv6 input and forwarding, but not for higher protocols.
>>
>> So I implement a queue between IP and higher layers. We had that
>> before when we were using netlock for IP and kernel lock for TCP.
>> Now we have shared lock for IP and exclusive lock for TCP. By using
>> a queue, we can upgrade the lock once for multiple packets.
>>
>> As you can see here, forwardings performance doubles from 4.5x10^9
>> to 9x10^9 . Left column is current, right column is with my diff.
>> The other dots at 2x10^9 are with socket splicing which is not
>> affected.
>> http://bluhm.genua.de/perform/results/2021-04-21T10%3A50%3A37Z/gnuplot/forward.png
>>
>> Here are all numbers with various network tests.
>> http://bluhm.genua.de/perform/results/2021-04-21T10%3A50%3A37Z/perform.html
>> TCP performance gets less deterministic due to the addition queue.
>>
>> Kernel stack flame graph looks like this. Machine uses 4 CPU.
>> http://bluhm.genua.de/files/kstack-multiqueue-forward.svg
>>
>> Note the kernel lock around nd6_resolve(). I hat to put it there
>> as I have seen an MP related crash there. This can be fixed
>> independently of this diff.
>>
>> We need more MP preassure to find such bugs and races. I think now
>> is a good time to give this diff broader testing and commit it.
>> You need interfaces with multiple queues to see a difference.
>>
>> ok?
> Hi,
>
> with this diff i'm getting panic when i'm pushing traffic over that box.
> This is plain forwarding. To compile with witness ?
with witness
x3550m4# panic: pool_cache_item_magic_check: mbufpl cpu free list
modified: item addr 0xfffffd8066b5e5
00+16 0xfffffd8066b5e570!=0x1474deeb99bfdf06
Stopped at db_enter+0x10: popq %rbp
TID PID UID PRFLAGS PFLAGS CPU COMMAND
*211939 58019 0 0x14000 0x200 1 softnet
173790 68166 0 0x14000 0x200 3 softnet
45539 46127 0 0x14000 0x200 2 softnet
358228 28782 0 0x14000 0x200 4 softnet
db_enter() at db_enter+0x10
panic(ffffffff81df726e) at panic+0x12a
pool_cache_get(ffffffff82203378) at pool_cache_get+0x25b
pool_get(ffffffff82203378,2) at pool_get+0x5e
m_clget(0,2,802) at m_clget+0xdd
ixgbe_get_buf(ffff80000015c9f8,a) at ixgbe_get_buf+0xa3
ixgbe_rxfill(ffff80000015c9f8) at ixgbe_rxfill+0x93
ixgbe_queue_intr(ffff80000011aec0) at ixgbe_queue_intr+0x4f
intr_handler(ffff800026df9740,ffff8000000cc500) at intr_handler+0x6e
Xintr_ioapic_edge0_untramp() at Xintr_ioapic_edge0_untramp+0x18f
ip_forward(fffffd8066b58400,ffff80000015a048,fffffd878909fa80,0) at
ip_forward+0x1de
ip_input_if(ffff800026df9a38,ffff800026df9a44,4,0,ffff80000015a048) at
ip_input_if+0x608
ipv4_input(ffff80000015a048,fffffd8066b58400) at ipv4_input+0x39
if_input_process(ffff80000015a048,ffff800026df9ab8) at if_input_process+0x6f
end trace frame: 0xffff800026df9b00, count: 0
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb{1}> show panic
pool_cache_item_magic_check: mbufpl cpu free list modified: item addr
0xfffffd8
066b5e500+16 0xfffffd8066b5e570!=0x1474deeb99bfdf06
ddb{1}> trace
db_enter() at db_enter+0x10
panic(ffffffff81df726e) at panic+0x12a
pool_cache_get(ffffffff82203378) at pool_cache_get+0x25b
pool_get(ffffffff82203378,2) at pool_get+0x5e
m_clget(0,2,802) at m_clget+0xdd
ixgbe_get_buf(ffff80000015c9f8,a) at ixgbe_get_buf+0xa3
ixgbe_rxfill(ffff80000015c9f8) at ixgbe_rxfill+0x93
ixgbe_queue_intr(ffff80000011aec0) at ixgbe_queue_intr+0x4f
intr_handler(ffff800026df9740,ffff8000000cc500) at intr_handler+0x6e
Xintr_ioapic_edge0_untramp() at Xintr_ioapic_edge0_untramp+0x18f
ip_forward(fffffd8066b58400,ffff80000015a048,fffffd878909fa80,0) at
ip_forward+0x1de
ip_input_if(ffff800026df9a38,ffff800026df9a44,4,0,ffff80000015a048) at
ip_input_if+0x608
ipv4_input(ffff80000015a048,fffffd8066b58400) at ipv4_input+0x39
if_input_process(ffff80000015a048,ffff800026df9ab8) at if_input_process+0x6f
ifiq_process(ffff80000015ef00) at ifiq_process+0x69
taskq_thread(ffff800000030300) at taskq_thread+0x9f
end trace frame: 0x0, count: -16
ddb{1}> show locks
shared rwlock netlock r = 0 (0xffffffff82119770)
#0 witness_lock+0x339
#1 if_input_process+0x43
#2 ifiq_process+0x69
#3 taskq_thread+0x9f
#4 proc_trampoline+0x1c
shared rwlock softnet r = 0 (0xffff800000030370)
#0 witness_lock+0x339
#1 taskq_thread+0x92
#2 proc_trampoline+0x1c
ddb{1}> show all locks
CPU 3:
exclusive mutex softnet r = 0 (0xffff800000030228)
#0 witness_lock+0x339
#1 mtx_enter_try+0x95
#2 mtx_enter+0x48
#3 msleep+0xe5
#4 taskq_next_work+0x61
#5 taskq_thread+0xce
#6 proc_trampoline+0x1c
Process 58019 (softnet) thread 0xffff800026dc87e0 (211939)
shared rwlock netlock r = 0 (0xffffffff82119770)
#0 witness_lock+0x339
#1 if_input_process+0x43
#2 ifiq_process+0x69
#3 taskq_thread+0x9f
#4 proc_trampoline+0x1c
shared rwlock softnet r = 0 (0xffff800000030370)
#0 witness_lock+0x339
#1 taskq_thread+0x92
#2 proc_trampoline+0x1c
Process 28782 (softnet) thread 0xffff800026dc8000 (358228)
shared rwlock softnet r = 0 (0xffff800000030070)
#0 witness_lock+0x339
#1 taskq_thread+0x92
#2 proc_trampoline+0x1c
ddb{1}> show all pools
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg
Maxpg Idle
arp 64 14 0 0 1 0 1 1 0
8 0
plcache 128 120 0 0 4 0 4 4 0
8 0
rtpcb 120 21 0 20 1 0 1 1 0
8 0
rtentry 112 52 0 0 2 0 2 2 0
8 0
unpcb 120 66 0 18 2 0 2 2 0
8 0
tcpcb 736 7 0 2 1 0 1 1 0
8 0
inpcb 304 205 0 197 1 0 1 1 0
8 0
art_heap8 4096 1 0 0 1 0 1 1 0
8 0
art_heap4 256 120 0 0 8 0 8 8 0
8 0
art_table 32 121 0 0 1 0 1 1 0
8 0
art_node 16 52 0 0 1 0 1 1 0
8 0
dirhash 1024 87 0 40 6 0 6 6 0
8 0
newdirblk 32 16 0 16 1 1 0 1 0
8 0
dirrem 64 1646 0 1646 27 27 0 27 0
8 0
mkdir 56 20 0 20 1 1 0 1 0
8 0
diradd 56 1658 0 1657 23 22 1 23 0
8 0
freefile 48 1627 0 1627 20 20 0 20 0
8 0
freeblks 192 1647 0 1647 81 81 0 81 0
8 0
freefrag 64 15 0 15 2 2 0 1 0
8 0
allocindir 104 10941 0 10941 236 225 11 211 0
8 11
indirdep 56 16 0 16 1 0 1 1 0
8 1
allocdir 128 2755 0 2755 76 75 1 76 0
8 1
bmsafemap 64 46 0 46 2 2 0 1 0
8 0
newblk 64 13696 0 13696 3 3 0 1 0
8 0
inodedep 160 1709 0 1707 70 68 2 70 0
8 1
pagedep 128 32 0 31 1 0 1 1 0
8 0
dino1pl 128 5313 0 1633 121 2 119 119 0
8 0
ffsino 272 5313 0 1633 250 4 246 246 0
8 0
nchpl 144 5602 0 2398 123 4 119 119 0
8 0
uvmvnodes 72 5340 0 0 98 0 98 98 0
8 0
vnodes 224 5340 0 0 315 0 315 315 0
8 0
namei 1024 17921 0 17921 3 2 1 1 0
8 1
percpumem 96 31 0 0 1 0 1 1 0
8 0
ehcixfer 296 175 0 170 1 0 1 1 0
8 0
scxspl 216 35538 0 35538 22 21 1 8 0
8 1
plimitpl 152 25 0 12 1 0 1 1 0
8 0
sigapl 424 426 0 376 9 2 7 8 0
8 0
futexpl 56 5484 0 5484 2 2 0 1 0
8 0
knotepl 112 99 0 38 2 0 2 2 0
8 0
kqueuepl 168 9 0 0 1 0 1 1 0
8 0
pipepl 336 105 0 105 4 4 0 1 0
8 0
fdescpl 496 395 0 376 5 1 4 5 0
8 0
filepl 152 7300 0 7201 5 0 5 5 0
8 0
lockfpl 104 4 0 4 1 1 0 1 0
8 0
lockfspl 48 2 0 2 1 1 0 1 0
8 0
sessionpl 144 11 0 1 1 0 1 1 0
8 0
pgrppl 48 17 0 7 1 0 1 1 0
8 0
ucredpl 96 62 0 43 1 0 1 1 0
8 0
zombiepl 144 376 0 376 3 3 0 1 0
8 0
processpl 1080 426 0 376 5 0 5 5 0
8 0
procpl 672 487 0 437 7 1 6 6 0
8 0
sockpl 432 292 0 235 7 0 7 7 0
8 0
mcl12k 12288 25 0 0 3 0 3 3 0
8 0
mcl4k 4096 1 0 0 1 0 1 1 0
8 0
mcl2k2 2112 644 0 0 43 0 43 43 0
8 0
mcl2k 2048 21 0 0 3 0 3 3 0
8 0
mtagpl 96 17 0 0 1 0 1 1 0
8 0
mbufpl 256 683 0 0 43 0 43 43 0
8 0
bufpl 280 49145 0 10072 2793 1 2792 2792 0
8 0
anonpl 24 141669 0 136823 234 28 206 231 0
3034 161
amapchunkpl 152 10572 0 10249 106 32 74 99 0
158 56
amappl16 200 826 0 822 22 21 1 15 0
8 0
amappl15 192 120 0 109 1 0 1 1 0
8 0
amappl14 184 9 0 9 2 2 0 1 0
8 0
amappl13 176 33 0 32 1 0 1 1 0
8 0
amappl12 168 88 0 82 4 3 1 4 0
8 0
amappl11 160 103 0 75 2 0 2 2 0
8 0
amappl10 152 11 0 11 2 2 0 1 0
8 0
amappl9 144 26 0 26 1 1 0 1 0
8 0
amappl8 136 899 0 886 13 12 1 13 0
8 0
amappl7 128 108 0 107 1 0 1 1 0
8 0
amappl6 120 353 0 324 5 4 1 4 0
8 0
amappl5 112 190 0 173 3 2 1 3 0
8 0
amappl4 104 1741 0 1695 23 20 3 22 0
8 0
amappl3 96 561 0 546 10 9 1 7 0
8 0
amappl2 88 2974 0 2814 36 30 6 29 0
8 0
amappl1 80 10296 0 9610 20 2 18 19 0
8 0
amappl 88 3970 0 3831 21 15 6 20 0
92 0
dma16384 16384 3 0 3 1 1 0 1 0
8 0
dma4096 4096 7 0 1 1 0 1 1 0
8 0
dma2048 2048 26 0 26 11 10 1 1 0
8 1
dma1024 1024 22 0 22 11 10 1 1 0
8 1
dma512 512 269 0 269 11 10 1 1 0
8 1
dma256 256 7 0 7 1 1 0 1 0
8 0
dma128 128 64 0 64 1 1 0 1 0
8 0
dma64 64 12 0 12 11 10 1 1 0
8 1
dma32 32 12 0 12 1 1 0 1 0
8 0
dma16 16 1 0 1 1 1 0 1 0
8 0
aobjpl 64 2 0 0 1 0 1 1 0
8 0
uaddrrnd 24 395 0 376 1 0 1 1 0
8 0
uaddrbest 32 2 0 0 1 0 1 1 0
8 0
uaddr 24 395 0 376 1 0 1 1 0
8 0
vmmpekpl 168 11751 0 11728 2 0 2 2 0
8 0
vmmpepl 168 47352 0 45611 332 35 297 330 0
357 204
vmsppl 368 394 0 376 3 0 3 3 0
8 0
rwobjpl 56 16755 0 15640 70 49 21 70 0
8 0
pdppl 4096 798 0 752 86 40 46 72 0
8 0
pvpl 32 622565 0 611384 1291 821 470 1260 0
265 352
pmappl 232 394 0 376 2 0 2 2 0
8 0
extentpl 40 271 0 182 1 0 1 1 0
8 0
phpool 112 445 0 82 11 0 11 11 0
8 0
ddb{1}> ps
PID TID PPID UID S FLAGS WAIT COMMAND
5155 486169 1 0 3 0x100083 ttyin ksh
21708 285930 1 0 3 0x100098 poll cron
62392 40931 25406 720 3 0x90 kqread lldpd
25406 20242 1 0 3 0x80 netio lldpd
5100 287713 79347 95 3 0x100092 kqread smtpd
46028 334535 79347 103 3 0x100092 kqread smtpd
54804 375444 79347 95 3 0x100092 kqread smtpd
3303 249576 79347 95 3 0x100092 kqread smtpd
97819 271021 79347 95 3 0x100092 kqread smtpd
3054 297952 79347 95 3 0x100092 kqread smtpd
79347 446793 1 0 3 0x100080 kqread smtpd
12533 505387 1 0 3 0x80 select sshd
30114 19756 1 0 3 0x100080 poll ntpd
7481 97074 94918 83 3 0x100092 poll ntpd
94918 433415 1 83 3 0x100092 poll ntpd
32160 146866 10207 73 3 0x100090 kqread syslogd
10207 206610 1 0 3 0x100082 netio syslogd
57814 94760 0 0 3 0x14200 bored smr
17640 263297 0 0 3 0x14200 pgzero zerothread
65686 290981 0 0 3 0x14200 aiodoned aiodoned
63357 193771 0 0 3 0x14200 syncer update
2489 279176 0 0 3 0x14200 cleaner cleaner
43206 463626 0 0 3 0x14200 reaper reaper
12693 95421 0 0 3 0x14200 pgdaemon pagedaemon
85510 318479 0 0 3 0x14200 bored crynlk
49385 33947 0 0 3 0x14200 bored crypto
3669 424498 0 0 3 0x14200 usbtsk usbtask
16215 282320 0 0 3 0x14200 usbatsk usbatsk
72714 194617 0 0 3 0x40014200 acpi0 acpi0
97166 158891 0 0 7 0x40014200 idle11
78318 444428 0 0 7 0x40014200 idle10
25961 75337 0 0 7 0x40014200 idle9
11485 363757 0 0 7 0x40014200 idle8
14568 345381 0 0 7 0x40014200 idle7
24392 17964 0 0 7 0x40014200 idle6
62703 151012 0 0 7 0x40014200 idle5
14769 92091 0 0 3 0x40014200 idle4
22913 159035 0 0 3 0x40014200 idle3
38081 22603 0 0 3 0x40014200 idle2
42810 217564 0 0 3 0x40014200 idle1
29482 62812 0 0 3 0x14200 bored sensors
*58019 211939 0 0 7 0x14200 softnet
68166 173790 0 0 7 0x14200 softnet
46127 45539 0 0 7 0x14200 softnet
28782 358228 0 0 7 0x14200 softnet
57025 242861 0 0 3 0x14200 bored systqmp
84257 166355 0 0 3 0x14200 bored systq
3048 213029 0 0 3 0x40014200 bored softclock
37973 15869 0 0 7 0x40014200 idle0
1 164030 0 0 3 0x82 wait init
0 0 -1 0 3 0x10200 scheduler swapper
ddb{1}> trace /p 0x58019
and the box freeze