Re: makesyscalls
Python is essentially uncrosscompilable and its maintainers have repeatedly rudely rejected efforts to make it so. If that weren't the case, and the way installed Python modules were "managed" (I use the term liberally) were made sane, I'd think it were a fine thing to use in base. But it is the case, and that won't be made sane, and so I think it belongs nowhere near NetBSD. Could you translate your prototype into a different language, one that's less basically hostile to our build system and its goals and benefits?
re: makesyscalls
i'm not very interested in a solution that doesn't use tools available to the build. you've not shown that there is sufficient pain here to force an external solution. i'm not sure i buy your claims about awk and size of program. IME, it just requires that one is strict to rules. if you want to use python for creating files in the tree for _our_ things, then i think you have to propose adding python to src/tools. [*] why aren't you willing to discuss a lua version? it has most of the features you complained awk is missing, and would make it relatively easy to unit-test the components easily. .mrg. [*] i support this generally, and as a non-public visible library that our gdb can use as well. i do not think we should add it to /usr/bin (but i might be convinceable that it can be installed in a non-standard location, using non- standard paths for libararies, as long as it does not have anything to do with or interfere with pkgsrc or other ways of using python.)
Re: makesyscalls
On 10.06.2020 01:13, David Holland wrote: > The question is: do we want the Python version in the tree now For this, I would say "NO", at least as long Python is out of base and IMO it shall not be there. But it is fine to put into othersrc/. On 10.06.2020 01:13, David Holland wrote: > Rewriting in C is a possible future step. The code generator I have in > mind going forward should not be done in Python. But again, more on > that later. I would like to have mksyscalls (and at some point makeioctls) much more flexible and as a tool scriptable. I had to iterate a dozen of times over all our syscalls in various fuzzers, sanitizers, debuggers, tracers etc. Something that is very needed is knowing the full serialized struct passed as a pointer to each syscall. It's a lot of work to teach the tool about it, but it could be finally centralized and time saved of repeatedly teaching all the other programs about this property of syscalls. signature.asc Description: OpenPGP digital signature
makesyscalls
Between various forms of prompting I got tired of waiting for a gsoc student to take up code generation for syscall argument handling. We should really be generating all the calls to copyin/copyout, not to mention all the compat32 and compat_* glue, not maintaining them by hand. More on that later. In the meantime the first step on this was to assess the current makesyscalls.sh and decide whether to (a) leave it alone as a separate mechanism, (b) integrate it into the new stuff, or (c) ignore it and write new stuff from scratch. My conclusion after wading into it is that between additions for rump, dtrace, and other things it's become quite a bit too large for an awk script. (Experience suggests that the limit is about 1000 lines, depending on how well structured the script is and what it's doing. Beyond this point, between there being no variable declarations, function arguments being unchcked, and other properties of awk, it becomes difficult to modify the script safely and alterations made regardless tend to resort to cutpaste and make the situation worse. This script is 1200 lines and it's definitely showing signs of deterioration.) This means that regarding (a) it shouldn't be left alone because it's itself becoming a problem. Meanwhile, regarding (c), it does too many things and has too many tentacles to be safely ignored or easily reimplemented. Consequently, as a step towards (b) I have translated it into Python; Python is still untyped but is substantially more robust than awk and has a decidedly larger size limit. The question is: do we want the Python version in the tree now, or should I just treat it as an intermediate development prototype that nobody has to see? Arguments pro: - this is definitely a step forward for maintainability - having it in the tree means other people can work on it readily - having it in the tree means it's there even if my future plans get derailed or don't materialize Arguments con: - it's Python and we don't have Python in base (and don't particularly want it) - Python being Python all the code paths I haven't managed to test by the time it's committed will probably crash the first time they're reached - definitely a risk of having introduced bugs (also found some, but so far all of them have been cosmetic things that awk doesn't trap on, like passing extra arguments to printf) Arguments I'm not interested in listening to: - it's Python and Python is a terrible language (might be so but see below) - it should be Lua (don't see any reason Lua beats awk for this) - sh/awk is fine for this and you're doing it wrong (no I'm not, I think I have as much experience using sh and awk on nontrivial things as anyone) There are not many languages to choose from: in base we have sh, awk, lua, and C (or C++), and of languages not in base I think Python is the only one sufficiently ubiquitous to justify using it for important build infrastructure. Well, maybe Perl is, but I'm trying to make the script _more_ maintainable :-) Note before flaming that it doesn't actually run during builds, only when someone regenerates the outputs after changing syscalls.master. Needing to have Python installed for this is not a showstopper. Rewriting in C is a possible future step. The code generator I have in mind going forward should not be done in Python. But again, more on that later. -- David A. Holland dholl...@netbsd.org
Re: TSC improvement
On Tue, Jun 09, 2020 at 05:16:27PM +, Taylor R Campbell wrote: > It's great to see improvements to our calibration of the TSC (and I > tend to agree that cpu_counter should be serializing, so that, e.g., > cpu_counter(); ...; cpu_counter() reliably measures time taken in the > ellipsis). I'm pretty sure we want to have both variants. Joerg
Re: pg_jobc going negative?
The FreeBSD refactoring LGTM. It also simplifies the code. christos > On Jun 9, 2020, at 2:11 PM, Robert Elz wrote: > >Date:Tue, 9 Jun 2020 17:04:54 +0200 >From:Kamil Rytarowski >Message-ID: > > > | Yes... syzkaller had like 12 different ways to reproduce it. > > OK, thanks. > > | There is still a race and we randomly go to negative pg_jobc. > > I am not at all surprised... > > I will look at it over the next couple of days. No guarantees... > > kre signature.asc Description: Message signed with OpenPGP
Re: pg_jobc going negative?
Date:Tue, 9 Jun 2020 17:04:54 +0200 From:Kamil Rytarowski Message-ID: | Yes... syzkaller had like 12 different ways to reproduce it. OK, thanks. | There is still a race and we randomly go to negative pg_jobc. I am not at all surprised... I will look at it over the next couple of days. No guarantees... kre
Re: TSC improvement
It's great to see improvements to our calibration of the TSC (and I tend to agree that cpu_counter should be serializing, so that, e.g., cpu_counter(); ...; cpu_counter() reliably measures time taken in the ellipsis). At the same time, I wonder whether we should _also_: 1. Modify the tsc timecounter so that it uses a global atomic to ensure that there is a global view of time as counted by the tsc. This is what the timecounter(9) API per se expects of timecounters, and right now tsc (along with various other per-CPU cycle counters) fails to guarantee that. 2. Introduce an API for a local timecounter -- a per-CPU timecounter that never goes backwards on a single CPU, but: (a) measures units of wall clock time, unlike cpu_counter(); (b) need not be synchronized between CPUs; and (c) may be cheaper to read than a global timecounter. We could then use that for, e.g., rusage calculations.
TSC improvement
Hi, all. I wrote a code to improve x86's TSC.x86/tsc.c rev. 1.67 reduced cache problem, but it still has room. I measured the effect of lfence, mfence, cpuid and rdtscp. The impact to TSC skew and/or drift is: AMD: mfence > rdtscp > cpuid > lfence-serialize > lfence = nomodify Intel: lfence > rdtscp > cpuid > nomodify So, mfence is the best on AMD and lfence is the best on Intel. If it has no SSE2, we can use cpuid. The logs of TSC calibration (from "dmesg -t |grep TSC") is at: http://www.netbsd.org/~msaitoh/tsc/tsc-20200605log.tgz Diff is at: http://www.netbsd.org/~msaitoh/tsc/tsc-20200609-0.dif In this diff, cpu_counter*(those functions are MI API) uses serializing. We can provide both cpu_counter() and cpu_counter_serializing(). I think almost all usecases requires serializing on x86. For RNG, serializing is not required but the overhead is trivial. I think users who use cpu_counter() use the function to get time difference precisely. The serializing is required to get precised value, so I think just adding it to x86's cpu_counter() is enough. If it's acceptable, I'd like to commit this change. Any comments/advice are welcomed. NOTE: - An AMD's document says DE_CFG_LFENCE_SERIALIZE bit can be used for serializing, but it's not so good. - On Intel i386(not amd64), it seems the improvement is very little. - rdtscp instruct can be used as serializing instruction + rdtsc, but it's not good as [lm]fence. Both Intel and AMD's document say that the latency of rdtscp is bigger than rdtsc, so I suspect the difference of the result comes from it. And, thanks ad@, knakahara@, nonaka@ and some other people to help this work. -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)
Re: pg_jobc going negative?
On 09.06.2020 16:35, Robert Elz wrote: > Date:Tue, 9 Jun 2020 14:13:56 +0200 > From:Kamil Rytarowski > Message-ID: <85d5e51f-afd1-1038-fd68-2366ff073...@netbsd.org> > > | Here is the simplest reproducer crashing the kernel on negative pg_jobc: > > I have not looked at this closely yet, but this is likely because > ptrace() fiddles p_pptr which the routines that manipulate the pg_jobc > more or less expect to be a constant. > > Is there any known reproducer of this problem which does not involve ptrace() > ? > Yes... syzkaller had like 12 different ways to reproduce it. As far as I can tell, in the syzkaller case they all are about races. One of them is here: https://syzkaller.appspot.com/text?tag=ReproC=128060f610 After adding the asserts, all look similar to me: forking + setpgid(0, 0). > At first glance, the manipulations of pg_jobc looks a bit dodgy to me, but I > haven't investigated enough to be able to spot a definite problem yet > (possible ptrace() generated issue aside - and yes, those need to work as > well). > > I doubt very much that adding a new mutex will make a difference, all the > manipulations are done with proc_lock held, which is kind of the "big lock" > for process manipulation - adding finer grained locking might improve > performance, by improving concurrency, but is unlikely (at this stage, > nothing is impossible) to be a fix for this problem. > There is still a race and we randomly go to negative pg_jobc. > kre >
Re: pg_jobc going negative?
Date:Tue, 9 Jun 2020 14:13:56 +0200 From:Kamil Rytarowski Message-ID: <85d5e51f-afd1-1038-fd68-2366ff073...@netbsd.org> | Here is the simplest reproducer crashing the kernel on negative pg_jobc: I have not looked at this closely yet, but this is likely because ptrace() fiddles p_pptr which the routines that manipulate the pg_jobc more or less expect to be a constant. Is there any known reproducer of this problem which does not involve ptrace() ? At first glance, the manipulations of pg_jobc looks a bit dodgy to me, but I haven't investigated enough to be able to spot a definite problem yet (possible ptrace() generated issue aside - and yes, those need to work as well). I doubt very much that adding a new mutex will make a difference, all the manipulations are done with proc_lock held, which is kind of the "big lock" for process manipulation - adding finer grained locking might improve performance, by improving concurrency, but is unlikely (at this stage, nothing is impossible) to be a fix for this problem. kre
Re: pg_jobc going negative?
On 09.06.2020 08:38, Maxime Villard wrote: >> Should we allow pg_jobc going negative? > > I don't think so, the code is not designed to expect negative values. Here is the simplest reproducer crashing the kernel on negative pg_jobc: http://netbsd.org/~kamil/ptrace/pg_jobc-crash.c On 09.06.2020 08:38, Maxime Villard wrote: > In short, (1) my understanding of it is that the code is not designed to > expect negative values, and (2) since I added the KASSERTs 10/11 of the > random bugs didn't trigger. Big signs the bug is indeed related to > refcounting. > > It would be nice if someone with better understanding than me of the lwp > group stuff could have a look, though. Generally, pg_jobc looks like a ref counting mechanism. FreeBSD reworked the code and added a dedicated pgrp mutex. I don't know which path is the best for us, especially regarding the SMP properties. + ad@
Re: pg_jobc going negative?
On 09.06.2020 10:23, Michael van Elst wrote: > m...@m00nbsd.net (Maxime Villard) writes: > >> You can see they are all different, but all have to do with reading the >> group pointer, which was either freed, overwritten, not initialized, >> unmapped, or contained pure garbage. This is typical of refcounting bugs >> where a resource disappears under your feet. > > pg_jobc is not a reference counter. The assertion probably stopped > a bug in a different place by coincidence. > As the first step, is it fine to replace all pg_jobc == 0/ != 0 checks with pg_jobc > 0 / <= 0? This should not make anything worse than it is now. The remaining code assumes that pg_jobc never goes below 0. And then, follow up with the removal of the assert. We will check with syzkaller whether the races/crashes are gone. signature.asc Description: OpenPGP digital signature
Re: pg_jobc going negative?
m...@m00nbsd.net (Maxime Villard) writes: >You can see they are all different, but all have to do with reading the >group pointer, which was either freed, overwritten, not initialized, >unmapped, or contained pure garbage. This is typical of refcounting bugs >where a resource disappears under your feet. pg_jobc is not a reference counter. The assertion probably stopped a bug in a different place by coincidence. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: pg_jobc going negative?
mo...@rodents-montreal.org (Mouse) writes: >> pg_jobc is a process group struct member counting the number of >> processes with [...] >> [W]e caught issues that pg_jobc can go negative. [...] >> Is going negative in the first place a real bug? >I would say, yes. A "count of processes with [condition]" cannot ever >truly be negative. Unfortunately it doesn't "count processes with [condition]", that's more a coincidence on how the mechanism is used. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: pg_jobc going negative?
Le 09/06/2020 à 02:49, Kamil Rytarowski a écrit : pg_jobc is a process group struct member counting the number of processes with a parent capable of doing job control. Once reaching 0, the process group is orphaned. With the addition of asserts checking for pg_jobc > 0 (by maxv@), we caught issues that pg_jobc can go negative. I have landed new ATF tests that trigger this reliably with ptrace(2) (src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.7). The problem was originally triggered with GDB. There are also other sources of these asserts due to races The ptrace(2) ATF tests are reliable triggering a negative pg_jobc, however there are racy tests doing the same as triggered by syzkaller (mentioned at least in [1]). Should we allow pg_jobc going negative? I don't think so, the code is not designed to expect negative values. (Other BSDs allow this.) They don't "allow it", they just don't have a KASSERT against it. Is going negative in the first place a real bug? There were 11 bugs reported by syzbot which showed severe memory corruption in this area. While investigating the issues, I looked at the recfounting stuff and added KASSERTs for sanity checking, and then realized they were actually firing when using the different reproducers generated by syzbot. Since I put these KASSERTs, 10 of the 11 random bugs didn't trigger, and instead it is the KASSERTs that fire earlier. Listed as duplicates: https://syzkaller.appspot.com/bug?id=50a4ddd341b90cf15a4814048ff51db04347279a You can see they are all different, but all have to do with reading the group pointer, which was either freed, overwritten, not initialized, unmapped, or contained pure garbage. This is typical of refcounting bugs where a resource disappears under your feet. Only one keeps firing once in a while (KASAN); that's understandable, because even though there is a big window where the KASSERTs can fire, the underlying race is still there and there is still the chance we miss the window and get memory corruptions. In short, (1) my understanding of it is that the code is not designed to expect negative values, and (2) since I added the KASSERTs 10/11 of the random bugs didn't trigger. Big signs the bug is indeed related to refcounting. It would be nice if someone with better understanding than me of the lwp group stuff could have a look, though. Maxime