On 09.06.2020 16:35, Robert Elz wrote: > Date: Tue, 9 Jun 2020 14:13:56 +0200 > From: Kamil Rytarowski <ka...@netbsd.org> > Message-ID: <85d5e51f-afd1-1038-fd68-2366ff073...@netbsd.org> > > | Here is the simplest reproducer crashing the kernel on negative pg_jobc: > > I have not looked at this closely yet, but this is likely because > ptrace() fiddles p_pptr which the routines that manipulate the pg_jobc > more or less expect to be a constant. > > Is there any known reproducer of this problem which does not involve ptrace() > ? >
Yes... syzkaller had like 12 different ways to reproduce it. As far as I can tell, in the syzkaller case they all are about races. One of them is here: https://syzkaller.appspot.com/text?tag=ReproC&x=128060f6100000 After adding the asserts, all look similar to me: forking + setpgid(0, 0). > At first glance, the manipulations of pg_jobc looks a bit dodgy to me, but I > haven't investigated enough to be able to spot a definite problem yet > (possible ptrace() generated issue aside - and yes, those need to work as > well). > > I doubt very much that adding a new mutex will make a difference, all the > manipulations are done with proc_lock held, which is kind of the "big lock" > for process manipulation - adding finer grained locking might improve > performance, by improving concurrency, but is unlikely (at this stage, > nothing is impossible) to be a fix for this problem. > There is still a race and we randomly go to negative pg_jobc. > kre >