On 22/04/21(Thu) 15:08, Mark Kettenis wrote: > > Date: Thu, 22 Apr 2021 14:43:24 +0200 > > From: Alexandr Nedvedicky <[email protected]> > > > > Hello, > > > > On Thu, Apr 22, 2021 at 01:09:34PM +0200, Alexander Bluhm wrote: > > > On Thu, Apr 22, 2021 at 12:33:13PM +0200, Hrvoje Popovski wrote: > > > > r620-1# papnpaiancini:cc :p :op > > > > opooolo_llc_ac_caccahhceh_ei_eti_tieetmme_mm__amgamigacigci__cc_hccehhcekcekc:: > > > > k :m bmubmfubfuppflp llc pc pcuup uf rfferree eel el iilsitss tm tom > > > > omddoidfiiifeifeidde:d ::i ti etietmme m > > > > a daddardd rd0 r0 > > > > xx0fxfffffffffffffddf88d08c0cc0c6c76afc9b3f04500400++01+61 610 6x0 > > > > fx0fxffffffffffdffdf88d08 > > > > 00020720d72a8c0049703eb!ef!e==!0=x009x59x95995b9ebbaee3ae3ae344ef54f5a4bff7db07990a9 > > > > > > Wow. 3 CPUs panic in pool_cache_get() pool_cache_item_magic_check > > > simultaneously. This makes me think we may have a bug there. > > > > > > > I took a look at arch/amd64/include/intrdefs.h where interrupt > > priorities are defined. > > > > IPL_NET has priority set to 7, > > IPL_SOFTNET has higher priority set 5 > > > > all allocations are coming from mbpool via m_gethdr(), interrupt > > level priority for mbpool is set to IPL_NET. If I understand > > code in m_pool_get() right, then the pool_cache_enter() does not > > stop guys who call m_gethdr() with IPL_SOFTNET. > > > > if we put KERNEL_LOCK() there the problem is gone, mostlikely > > because the IPL_SOFTNET guy waits for KERNEL_LOCK therefore it > > can not interfere with our IPL_NET task, which forwards packet. > > > > I admit it's a poor speculation, I have no 'hard proof' for my > > claim here. So I might be very wrong here. > > Not sure what you are trying to say here, but IPL_SOFTNET is lower > than IPL_NET. So code that runs at IPL_SOFTNET will raise the IPL to > IPL_NET in pool_cache_enter(), blocking IPL_NET interrupts until > pool_cache_leave() is called and the IPL is lowered again to > IPL_SOFTNET.
IPL_SOFTNET is only blocking timeouts that don't run in a thread context. It is mostly a legacy of the time where received packets where processed in a soft-interrupt context. > I'm fairly confident the "normal" pools are mpsafe; we have been using > those in concurrent contexts without holding the kernel lock for a > long time already. But the pool cache layer is still relatively new... That said, no other subsystem in the kernel is currently as multi-threaded as a network stack with 4 threads. So it is possible that existing races are more likely to show up with this diff,
