On 27/06/20(Sat) 00:35, Vitaliy Makkoveev wrote: > On Fri, Jun 26, 2020 at 09:12:16PM +0200, Martin Pieuchot wrote: > > On 26/06/20(Fri) 16:56, Vitaliy Makkoveev wrote: > > > if_clone_create() has the races caused by context switch. > > > > Can you share a backtrace of such race? Where does the kernel panic? > > > > This diff was inspired by thread . As I explained  here is 3 > issues that cause panics produced by command below: > > ---- cut begin ---- > for i in 1 2 3; do while true; do ifconfig bridge0 create& \ > ifconfig bridge0 destroy& done& done > ---- cut end ----
Thanks, I couldn't reproduce it on any of the machines I tried. Did you managed to reproduce it with other pseudo-devices or just with bridge0? > My system was stable with the last diff I did for thread . But since > this final diff  which include fixes for tun(4) is quick and dirty > and not for commit I decided to make the diff to fix the races caused by > if_clone_create() at first. > > I included screenshot with panic. Thanks, interesting that the corruption happens on a list that should be initialized. Does that mean the context switch on Thread 1 is happening before if_attach_common() is called? You said your previous email that there's a context switch. Do you know when it happens? You could see that in ddb by looking at the backtrace of the other CPU. Is the context switch leading to the race common to all pseudo-drivers or is it in the bridge(4) driver? Regarding your solution, do I understand correctly that the goal is to serialize all if_clone_create()? Is it really needed to remember which unit is being currently created or can't we just serialize all of them? The fact that a lock is not held over the cloning operation is imho positive.