Re: NVMM missing opcode REPE CMPS implementation, version 2
Le 31/10/2020 à 23:58, Reinoud Zandijk a écrit : Dear folks, I've reworked the patch using the feedback I got from Maxime and had to refactor some minor parts in the other code. I'm in the process of writing the test-cases now but would like feedback for this solution in the meantime. It ought now give a proper REPE memory assist implementation. It could use some minor enhancements still though. With regards, Reinoud Note that the main NVMM tree is not NetBSD anymore. Also I am no longer a NetBSD developer, please stop spamming me.
Re: NVMM missing opcode REPE CMPS implementation
Le 29/10/2020 à 21:54, Reinoud Zandijk a écrit : Dear folks, i stumbled on an instruction that NVMM couldn't code and thus couldn't emulate either. It was as stated the REPE CMPS (0xF3A7) instruction as stated in https://c9x.me/x86/html/file_module_x86_id_279.html and confirmed by disassembly by ndisasm (from nasm). Appended is the implementation of imentioned instruction together with its byte sized sibling 0xF3A6. When installing the modified libnvmm, qemu behaves like NVMM is not used. I think the implementation does the right thing but feel free to double check! Tested and found by NetBSD/amd64 9.99.74 (19 oct) on an Intel Celeron 2957U by executing: qemu-system-x86_64 -accel nvmm -nographic -netdev \ user,id=n0,tftp=/usr/mdec,bootfile=pxeboot_ia32.bin -device \ e1000,netdev=n0 -boot n With regards, Reinoud This is incorrect and you should revert. x86_emul_cmp deals with single memory operands, not double. And rdi+rsi must be incremented/decremented depending on PSL_D. Also you added printfs in the library, wtf? As a general rule each instruction that libnvmm can emulate should have unit tests in t_mem_assist -- in fact here a single test case would have shown that the code is obviously wrong. Maxime
Re: pg_jobc going negative?
Le 09/06/2020 à 02:49, Kamil Rytarowski a écrit : pg_jobc is a process group struct member counting the number of processes with a parent capable of doing job control. Once reaching 0, the process group is orphaned. With the addition of asserts checking for pg_jobc > 0 (by maxv@), we caught issues that pg_jobc can go negative. I have landed new ATF tests that trigger this reliably with ptrace(2) (src/tests/lib/libc/sys/t_ptrace_fork_wait.h r.1.7). The problem was originally triggered with GDB. There are also other sources of these asserts due to races The ptrace(2) ATF tests are reliable triggering a negative pg_jobc, however there are racy tests doing the same as triggered by syzkaller (mentioned at least in [1]). Should we allow pg_jobc going negative? I don't think so, the code is not designed to expect negative values. (Other BSDs allow this.) They don't "allow it", they just don't have a KASSERT against it. Is going negative in the first place a real bug? There were 11 bugs reported by syzbot which showed severe memory corruption in this area. While investigating the issues, I looked at the recfounting stuff and added KASSERTs for sanity checking, and then realized they were actually firing when using the different reproducers generated by syzbot. Since I put these KASSERTs, 10 of the 11 random bugs didn't trigger, and instead it is the KASSERTs that fire earlier. Listed as duplicates: https://syzkaller.appspot.com/bug?id=50a4ddd341b90cf15a4814048ff51db04347279a You can see they are all different, but all have to do with reading the group pointer, which was either freed, overwritten, not initialized, unmapped, or contained pure garbage. This is typical of refcounting bugs where a resource disappears under your feet. Only one keeps firing once in a while (KASAN); that's understandable, because even though there is a big window where the KASSERTs can fire, the underlying race is still there and there is still the chance we miss the window and get memory corruptions. In short, (1) my understanding of it is that the code is not designed to expect negative values, and (2) since I added the KASSERTs 10/11 of the random bugs didn't trigger. Big signs the bug is indeed related to refcounting. It would be nice if someone with better understanding than me of the lwp group stuff could have a look, though. Maxime
Re: ipfilter crash in latest netbsd-9
Le 01/06/2020 à 03:06, Emmanuel Dreyfus a écrit : Emmanuel Dreyfus wrote: After upgrading to 9.0, I experienced crashes when enabling ipfilter (backtrace below). I tried latest netbsd-9 kernel without improvement. In the meantime, I filed kern/55236 about the ipfilter regression in NetBSD 9.0 i386 Xen. It happens with i386 XEN3PAE_DOMU kernels when ipfilter is loaded as a module. If I add back pseudo-device ipfilter in the kernel, the problems vanishes. Maxime, you made the change quite a long time ago. Nobody reported issue? Would you have an idea of where this bug stems from? http://mail-index.netbsd.org/source-changes/2018/08/01/msg097228.html I don't have time to check precisely right now, but if the problem arises in a module and not in built-in, then it probably is a #define mishap. Try to rebuild the module with -DXEN and -DPAE in CFLAGS.
fault(4)
[I am not subscribed to this list, so if you want to answer, make sure to CC me] In order to explore error branches, and to test the kernel's ability to cope with failures, it is often necessary to hard-trigger such failures. Here is an implementation [1] for fault(4), a driver that allows to trigger failures in the kernel. A similar driver exists in Linux. The fault_inject() function decides whether to return true or false, depending on parameters configurable by userland via ioctls on /dev/fault. The caller of this function should then error out depending on the return value. Typically: whatever_subsystem() { ... if (fault_inject()) return NULL; // means failure ... return non_null; // means success } Several modes can be available, I have implemented one for now, the N-th mode: every N-th call to fault_inject (N being configurable) will make it return true. Several scopes are available: global (ie system-wide), or calling LWP. Examples: - mode=NTH scope=GLOBAL: every N-th call to fault_inject() in the whole kernel will return true, regardless of the LWP. - mode=NTH scope=LWP: every N-th call to fault_inject() made by the LWP that enabled the mode will return true. For the other LWPs, fault_inject() always returns false. fault_inject() can be introduced in any place of interest. For now I added it in pool_cache_get(): if (flags & PR_NOWAIT) { if (fault_inject()) return NULL; } Running ATF with kASan+LOCKDEBUG+fault with {N=32 scope=GLOBAL} already gives an instant crash: kernel diagnostic assertion "radix_tree_empty_tree_p(&pmap->pm_pvtree)" failed: file ".../sys/arch/x86/x86/pmap.c" Looks like radixtree.c doesn't handle allocation failures very well somewhere. fault(4) seems like the kind of feature that would be useful for stress-testing and fuzzing. As you can see in the diff, its code is extremely simple. Maxime [1] https://m00nbsd.net/garbage/fault/fault.diff
bus_dmamap_sync: bug?
Seems to me there is a bug in x86's bus_dmamap_sync(). When the DMA buffer is an UIO, and we bounce, the 'offset' argument is not taken into account properly. We should do copy uio+off into bouncebuf+off but we do copy uio+0 into bouncebuf+off A quick glance shows that other architectures have the same problem, like ARM32's _bus_dma_uiomove(). Maxime
Re: EFI memory map
Le 02/01/2020 à 16:55, Emmanuel Dreyfus a écrit : And indeed, studying the crash in ddb shows it happens when accessing a physical address that is excluded by x86_fake_clusters() but included by EFI memory map. Note that x86_fake_clusters() is unsafe. It does not exclude the MMIO pages, because only the bios can tell where they are. These pages can get returned by uvm_pagealloc, triggering all sorts of crazy behavior, potentially physically nuking the machine. I think we should remove this function. Maxime
Re: Proposal: removing urio(4), Rio 500 MP3 player (1999), and Rio-related packages
Le 02/01/2020 à 19:29, Jason Thorpe a écrit : On Jan 2, 2020, at 10:16 AM, co...@sdf.org wrote: Hi folks, urio(4) is a driver for Rio 500, an MP3 player released in 1999. It recently received some discussion due to problems detected during maxv's USB-fuzzing. Well, not exactly, the concerns were around the permissions of /dev/urio, and following the discussion it turned out that the driver was a good candidate for nuking. Yah, seems like a good candidate for nuking. Two other drivers were also discussed: - uyurex, which is currently disabled, and has been disabled forever. My concern in this driver is that it has buffer overflows. The developer who imported it is no longer in tnf. This driver is supposed to handle exotic body sensor devices apparently sold in Japan around 2008. - uscanner, which was brought up by other people for an unrelated reason. It was removed from FreeBSD in 2009, from OpenBSD in 2013, and disabled in NetBSD in 2016. It has been superseded by ugen+SANE. I'll let Maya decide whether to formulate proposals for these two additional drivers. Maxime
Re: [filemon] CVS commit: htdocs/support/security
Le 17/12/2019 à 15:44, Andrew Doran a écrit : Typically with a character device, the kmod can get unloaded while an ioctl is being executed on it. That's solvable. Conceptually I think the main stumbling block is that there are two layers at play which need to reconciled: specfs and devsw. It could also be an opportunity to lessen the distinction between block and character devices, there's no real need for cached access from userspace, that would simplify things too. When it comes to syscalls, I haven't looked closely, but the issue is likely the same. It's atomic and side effect free if done correctly. We have pleasing examples of this. This is hard to get right though, so naturally mistakes are made. Now that I'm looking at the syscall_{dis}establish() functions in detail, I see that indeed the l_sysent checking is a good idea and does not require refcounts; however the accesses on 'sy_call' should be atomic_relaxed. The cost of these atomics is my concern. Maxime
Re: [filemon] CVS commit: htdocs/support/security
Le 17/12/2019 à 12:34, Kamil Rytarowski a écrit : On 17.12.2019 09:16, Maxime Villard wrote: Module Name: htdocs Committed By: christos Date: Tue Dec 17 01:03:49 UTC 2019 Modified Files: htdocs/support/security: advisory.html index.html Log Message: new advisory To generate a diff of this commit: cvs rdiff -u -r1.140 -r1.141 htdocs/support/security/advisory.html cvs rdiff -u -r1.173 -r1.174 htdocs/support/security/index.html Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. There is something I don't understand here. Why keep this totally useless misfeature, when we already have many tracing facilities that can do just about the same job without having security issues? The recommendation in the advisory is literally to remove the kernel module from the fs. I don't see what could possibly be the use of such a misfeature as filemon; I would remove it completely from the kernel source tree. Maxime From: http://ftp.netbsd.org/pub/NetBSD/security/advisories/NetBSD-SA2019-006.txt.asc "Additionally the way filemon does filesystem interception is racy and can lead to random crashes if the system calls are in use while the module is unloaded." Is this issue fixable? Not speaking for filemon in particular, I find this ability to rewrite the syscall table on the fly as a feature. Keeping a functional module with this property (even if disabled by default) seems useful to me. As far as I can tell, there are many races caused by autoloading. Typically with a character device, the kmod can get unloaded while an ioctl is being executed on it. When it comes to syscalls, I haven't looked closely, but the issue is likely the same. You can use tricks to "narrow down" the races; eg in NVMM, I use a global 'nmachines' variable, which prevents unloading in ~most cases. But I see no way to fix these races, except using atomic refcounts and mutexes on the ioctls and syscalls; obviously, this won't scale. I find this autoloading stuff to be a misfeature, too. If we want something on by default, then we should put it in GENERIC; if we want it disabled but accessible, then we should put it in a kmod that requires a manual modload from root. Putting stuff in kmods AND having the kernel load them automatically serves no purpose; it just adds bugs, and sometimes creates the wrong feeling that a driver is disabled while it isn't (like filemon). Other than that, I don't really understand your point; you can still do syscall interception with a custom kmod if you want, regardless of filemon. Maxime
Re: [filemon] CVS commit: htdocs/support/security
Le 17/12/2019 à 14:32, Kamil Rytarowski a écrit : On 17.12.2019 14:19, Maxime Villard wrote: Le 17/12/2019 à 12:34, Kamil Rytarowski a écrit : On 17.12.2019 09:16, Maxime Villard wrote: Module Name: htdocs Committed By: christos Date: Tue Dec 17 01:03:49 UTC 2019 Modified Files: htdocs/support/security: advisory.html index.html Log Message: new advisory To generate a diff of this commit: cvs rdiff -u -r1.140 -r1.141 htdocs/support/security/advisory.html cvs rdiff -u -r1.173 -r1.174 htdocs/support/security/index.html Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. There is something I don't understand here. Why keep this totally useless misfeature, when we already have many tracing facilities that can do just about the same job without having security issues? The recommendation in the advisory is literally to remove the kernel module from the fs. I don't see what could possibly be the use of such a misfeature as filemon; I would remove it completely from the kernel source tree. Maxime From: http://ftp.netbsd.org/pub/NetBSD/security/advisories/NetBSD-SA2019-006.txt.asc "Additionally the way filemon does filesystem interception is racy and can lead to random crashes if the system calls are in use while the module is unloaded." Is this issue fixable? Not speaking for filemon in particular, I find this ability to rewrite the syscall table on the fly as a feature. Keeping a functional module with this property (even if disabled by default) seems useful to me. As far as I can tell, there are many races caused by autoloading. Typically with a character device, the kmod can get unloaded while an ioctl is being executed on it. When it comes to syscalls, I haven't looked closely, but the issue is likely the same. You can use tricks to "narrow down" the races; eg in NVMM, I use a global 'nmachines' variable, which prevents unloading in ~most cases. But I see no way to fix these races, except using atomic refcounts and mutexes on the ioctls and syscalls; obviously, this won't scale. I find this autoloading stuff to be a misfeature, too. If we want something on by default, then we should put it in GENERIC; if we want it disabled but accessible, then we should put it in a kmod that requires a manual modload from root. Putting stuff in kmods AND having the kernel load them automatically serves no purpose; it just adds bugs, and sometimes creates the wrong feeling that a driver is disabled while it isn't (like filemon). Other than that, I don't really understand your point; you can still do syscall interception with a custom kmod if you want, regardless of filemon. Out of filemon, I am only interested in syscall_hijack (filemon_wrapper_install() + filemon_wrapper_deinstall()). If that can be reliable, I would keep it in src/sys/modules/example. https://nxr.netbsd.org/xref/src/sys/dev/filemon/filemon_wrapper.c#90 This code is big cancer, and I'd rather not make it an example. I would prefer an example with syscall_{dis}establish(). I have no personal interest on the rest of filemon. Switching this make(1) feature to at least ptrace(2) should be straightforward. Yes, and that would be a lot more useful and reliable than filemon. What's clear nevertheless, is that now that we've stopped installing the filemon kmod and told users to actually remove the file, make+filemon has no remaining use. I will prepare the removal of filemon. Maxime
[filemon] CVS commit: htdocs/support/security
Module Name:htdocs Committed By: christos Date: Tue Dec 17 01:03:49 UTC 2019 Modified Files: htdocs/support/security: advisory.html index.html Log Message: new advisory To generate a diff of this commit: cvs rdiff -u -r1.140 -r1.141 htdocs/support/security/advisory.html cvs rdiff -u -r1.173 -r1.174 htdocs/support/security/index.html Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. There is something I don't understand here. Why keep this totally useless misfeature, when we already have many tracing facilities that can do just about the same job without having security issues? The recommendation in the advisory is literally to remove the kernel module from the fs. I don't see what could possibly be the use of such a misfeature as filemon; I would remove it completely from the kernel source tree. Maxime
Re: multiboot for amd64
Le 17/12/2019 à 02:53, Emmanuel Dreyfus a écrit : Hello The initial commit of multiboot for amd64 had a problem with kernel linking that broke BIOS boot. I fixed the linker script, but in the meantime, the multiboot option in amd64 was also commented out. Is there any objection against re-enabling multiboot for amd64? Yes, there is. https://mail-index.netbsd.org/source-changes-d/2019/12/14/msg011884.html Also, the code is still bloated with unnecessary stuff, and still hasn't been moved to a separate function (or file).
Re: racy acccess in kern_runq.c
Le 06/12/2019 à 20:37, Mindaugas Rasiukevicius a écrit : Maxime Villard wrote: Le 06/12/2019 à 17:53, Andrew Doran a écrit : On Fri, Dec 06, 2019 at 05:22:39PM +0900, Kengo NAKAHARA wrote: /* Update the worker */ - worker_ci = hci; + atomic_swap_ptr(&worker_ci, hci); Why atomic_swap_ptr() not atomic_store_relaxed()? I don't see any bug that it fixes. Other than that it look OK to me. Because I suggested it; my concern was that if not explicitly atomic, the cpu could make two writes internally (even though the compiler produces only one instruction), and in that case a page fault would have been possible because of garbage dereference. No, atomic_store_relaxed() is sufficient; that is what it is for. Alright. The patch needs to be completed though; ie, 'spc_mcount++' too should use relaxed atomics, UVM style ([1]). [1] https://nxr.netbsd.org/diff/src/sys/uvm/uvm_fault.c?r2=%2Fsrc%2Fsys%2Fuvm%2Fuvm_fault.c%401.209&r1=%2Fsrc%2Fsys%2Fuvm%2Fuvm_fault.c%401.208
Re: racy acccess in kern_runq.c
Le 06/12/2019 à 17:53, Andrew Doran a écrit : On Fri, Dec 06, 2019 at 05:22:39PM +0900, Kengo NAKAHARA wrote: /* Update the worker */ - worker_ci = hci; + atomic_swap_ptr(&worker_ci, hci); Why atomic_swap_ptr() not atomic_store_relaxed()? I don't see any bug that it fixes. Other than that it look OK to me. Because I suggested it; my concern was that if not explicitly atomic, the cpu could make two writes internally (even though the compiler produces only one instruction), and in that case a page fault would have been possible because of garbage dereference. To be honest, I'm not totally sure whether it is a valid concern; theoretically, it is.
Re: racy acccess in kern_runq.c
Le 06/12/2019 à 11:28, Andrew Doran a écrit : > On Fri, Dec 06, 2019 at 10:27:20AM +0100, Maxime Villard wrote: > >> With 'worker_ci', there is an actual safety issue, because the compiler could >> split the accesses and the hardware may not use atomics by default like x86. >> This could cause random page faults; so it needs to be strictly atomic. > > No I don't accept that. > > The ability to load and store a native word sized int (and in more recent > years a pointer) with a single instruction is a fundamental assumption that > every operating system written in C rests upon. 1) That it is done with a single instruction, does not necessarily mean it is atomic. There are many stupid things microcodes are allowed to do (not on x86, however). 2) That there is a one-line assignment of a variable in C, does not necessarily mean that there will be one move instruction. There are many things the compiler is allowed to do in optimization passes, as said already. 3) "every operating system written in C rests upon" I have some difficulty with that. Linux has our equivalent of relaxed atomics (called {READ,WRITE}_ONCE), with 1000+ combined references in their tree. So no, here's a big open source kernel that does not rest upon unclear assumptions for lockless operations. No way to check Windows, but I wouldn't be surprised if they had a similar approach. > If the compiler splits one of those acceses, then you are either using some > other data type, or have a broken compiler on your hands. Or you're just using a compiler that knows well enough how to arrange instructions for optimized pipelining, and this compiler decides to re-order the accesses based on desirable performance properties, as allowed by the standard. > https://www.gnu.org/software/libc/manual/html_node/Atomic-Types.html I don't see anything in this link that is really reassuring. "you can assume" that things are mostly ok on "systems we know of". At least the rationale for {READ,WRITE}_ONCE is based on standards, and not wild assumptions: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE To me, Kengo's 2nd patch remains needed. Maxime
Re: racy acccess in kern_runq.c
Le 06/12/2019 à 10:00, Andrew Doran a écrit : > Hi, > > On Fri, Dec 06, 2019 at 03:52:23PM +0900, Kengo NAKAHARA wrote: > >> There are some racy accesses in kern_runq.c detected by KCSAN. Those >> racy access messages is so frequency that they cover other messages, >> so I want to fix them. They can be fixed by the following patch. > > Please don't commit this. These accesses are racy by design. There is no > safety issue and we do not want to disturb the activity of other CPUs in > this code path by locking them. We also don't want to use atomics either. > This code path is extremely hot using atomics would impose a severe > performance penalty on the system under certain workloads. With 'worker_ci', there is an actual safety issue, because the compiler could split the accesses and the hardware may not use atomics by default like x86. This could cause random page faults; so it needs to be strictly atomic. Apart from that, yes, there is no other safety issue and locking would be too expensive. All we possibly care about is making sure the accesses aren't split, for the sake of not basing the scheduling policy on systematic garbage, and atomic_relaxed seems like the right thing to do (Kengo's 2nd patch), considering that it costs ~nothing. > Also if you change this to use strong ordering, you're quite likely to > change the selection behaviour of LWPs. This is something delicate that > exhibits reasonable scheduling behaviour in large part through randomness > i.e by accident, so serialising it it's likely to have strong effects on how > LWPs are distributed. > > Lastly I have a number of experimental changes to this code which I'll be > committing in the near future allowing people to change the LWP selection > policy to see how if we can improve performance under different workloads. > They will also likely show up in KCSAN as racy/dangerous accesses. > > My suggestion is to find a way to teach KCSAN that we know something is > racy, we like it being racy, and that it's not a problem, so that it no > longer shows up in the KCSAN output. Maybe we should indeed have a macro to say "yes this access is racy but we don't care". Currently this macro is atomic_{store,load}_relaxed()...
Re: racy acccess in kern_runq.c
Le 06/12/2019 à 07:52, Kengo NAKAHARA a écrit : > Hi, > > There are some racy accesses in kern_runq.c detected by KCSAN. Those > racy access messages is so frequency that they cover other messages, > so I want to fix them. They can be fixed by the following patch. > > > diff --git a/sys/kern/kern_runq.c b/sys/kern/kern_runq.c > index 04ff1732016..bb0815689cd 100644 > --- a/sys/kern/kern_runq.c > +++ b/sys/kern/kern_runq.c > @@ -627,7 +627,7 @@ sched_balance(void *nocallout) > /* Make lockless countings */ > for (CPU_INFO_FOREACH(cii, ci)) { > spc = &ci->ci_schedstate; > - > + spc_lock(ci); > /* > * Average count of the threads > * > @@ -643,10 +643,11 @@ sched_balance(void *nocallout) > hci = ci; > highest = spc->spc_avgcount; > } > + spc_unlock(ci); > } > > /* Update the worker */ > - worker_ci = hci; > + atomic_swap_ptr(&worker_ci, hci); > > if (nocallout == NULL) > callout_schedule(&balance_ch, balance_period); > @@ -734,12 +735,15 @@ sched_idle(void) > spc_unlock(ci); > > no_migration: > + spc_lock(ci); > if ((spc->spc_flags & SPCF_OFFLINE) != 0 || spc->spc_count != 0) { > + spc_unlock(ci); > return; > } > > /* Reset the counter, and call the balancer */ > spc->spc_avgcount = 0; > + spc_unlock(ci); > sched_balance(ci); > tci = worker_ci; > tspc = &tci->ci_schedstate; > It just so happens we were talking about this yesterday. worker_ci indeed needs to be a real atomic, but it means we also have to fetch it atomically here: 744 tci = worker_ci; For the other variables, my understanding was that we don't care about races, but only care about making sure the accesses are not split, so it seems to me atomic_relaxed would suffice.
Re: __{read,write}_once
Le 21/11/2019 à 23:08, David Young a écrit : On Thu, Nov 21, 2019 at 07:19:51PM +0100, Maxime Villard wrote: Le 18/11/2019 à 19:49, David Holland a écrit : On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote: > David Holland wrote: > > > I see the potential source of confusion, but just think about: what > > > could "atomic" possibly mean for loads or stores? A load is just a > > > load, there is no RMW cycle, conceptually; inter-locking the operation > > > does not make sense. For anybody who attempts to reason about this, > > > it should not be too confusing, plus there are man pages. > > > > ...that it's not torn. > > > > As far as names... there are increasingly many slightly different > > types of atomic and semiatomic operations. > > > > I think it would be helpful if someone came up with a comprehensive > > naming scheme for all of them (including ones we don't currently have > > that we're moderately likely to end up with later...) > > Yes, the meaning of "atomic" has different flavours and describes different > set of properties in different fields (operating systems vs databases vs > distributed systems vs ...) and, as we can see, even within the fields. > > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the > dominant terms. Yes, but "relaxed" means something else... let me be clearer since I wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by whether or not multiple instances of it are globally ordered, not by whether or not it's actually atomic relative to other cpus. Checking the prior messages indicates we aren't currently talking about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed() and atomic_store_32_relaxed(), which would be used together to generate a local counter. This is less misleading, but I'm still not convinced it's a good choice of names given that we might reasonably later on want to have atomic_inc_32_relaxed() and atomic_inc_32_ordered() that differ as above. > I think it is pointless to attempt to reinvent the wheel > here. It is terminology used by C11 (and C++11) and accepted by various > technical literature and, at this point, by academia (if you look at the > recent papers on memory models -- it's pretty much settled). These terms > are not too bad; it could be worse; and this bit is certainly not the worst > part of C11. So, I would just move on. Is it settled? My experience with the academic literature has been that everyone uses their own terminology and the same words are routinely found to have subtly different meanings from one paper to the next and occasionally even within the same paper. :-/ But I haven't been paying much attention lately due to being preoccupied with other things. So in the end which name do we use? Are people really unhappy with _racy()? At least it has a general meaning, and does not imply atomicity or ordering. _racy() doesn't really get at the intended meaning, and everything in C11 is racy unless you arrange otherwise by using mutexes, atomics, etc. The suffix has very little content. Names such as load_/store_fully() or load_/store_entire() or load_/store_completely() get to the actual semantics: at the program step implied by the load_/store_entire() expression, the memory constituting the named variable is loaded/stored in its entirety. In other words, the load cannot be drawn out over more than one local program step. Whether or not the load/store is performed in one step with respect to interrupts or other threads is not defined. Sounds like the right definition of the semantic. However: I feel like load_entire() and store_entire() get to the heart of the matter while being easy to speak, but _fully() or _completely() seem fine. I'm not a really big fan of _entire(), because it gives the impression that the load/store is not entire otherwise - which it is, but just possibly not in one step. How about _onestep()? Or _nosplit()?
Re: __{read,write}_once
Le 18/11/2019 à 19:49, David Holland a écrit : On Sun, Nov 17, 2019 at 02:35:43PM +, Mindaugas Rasiukevicius wrote: > David Holland wrote: > > > I see the potential source of confusion, but just think about: what > > > could "atomic" possibly mean for loads or stores? A load is just a > > > load, there is no RMW cycle, conceptually; inter-locking the operation > > > does not make sense. For anybody who attempts to reason about this, > > > it should not be too confusing, plus there are man pages. > > > > ...that it's not torn. > > > > As far as names... there are increasingly many slightly different > > types of atomic and semiatomic operations. > > > > I think it would be helpful if someone came up with a comprehensive > > naming scheme for all of them (including ones we don't currently have > > that we're moderately likely to end up with later...) > > Yes, the meaning of "atomic" has different flavours and describes different > set of properties in different fields (operating systems vs databases vs > distributed systems vs ...) and, as we can see, even within the fields. > > Perhaps not ideal, but "atomic loads/stores" and "relaxed" are already the > dominant terms. Yes, but "relaxed" means something else... let me be clearer since I wasn't before: I would expect e.g. atomic_inc_32_relaxed() to be distinguished from atomic_inc_32() or maybe atomic_inc_32_ordered() by whether or not multiple instances of it are globally ordered, not by whether or not it's actually atomic relative to other cpus. Checking the prior messages indicates we aren't currently talking about atomic_inc_32_relaxed() but only about atomic_load_32_relaxed() and atomic_store_32_relaxed(), which would be used together to generate a local counter. This is less misleading, but I'm still not convinced it's a good choice of names given that we might reasonably later on want to have atomic_inc_32_relaxed() and atomic_inc_32_ordered() that differ as above. > I think it is pointless to attempt to reinvent the wheel > here. It is terminology used by C11 (and C++11) and accepted by various > technical literature and, at this point, by academia (if you look at the > recent papers on memory models -- it's pretty much settled). These terms > are not too bad; it could be worse; and this bit is certainly not the worst > part of C11. So, I would just move on. Is it settled? My experience with the academic literature has been that everyone uses their own terminology and the same words are routinely found to have subtly different meanings from one paper to the next and occasionally even within the same paper. :-/ But I haven't been paying much attention lately due to being preoccupied with other things. So in the end which name do we use? Are people really unhappy with _racy()? At least it has a general meaning, and does not imply atomicity or ordering.
Re: __{read,write}_once
Le 16/11/2019 à 15:31, Mindaugas Rasiukevicius a écrit : Maxime Villard wrote: Alright so let's add the macros with volatile (my initial patch). Which name do we use? I actually like __{read,write}_racy(). I suggest __atomic_load_relaxed()/__atomic_store_relaxed(). What I don't like with "atomic" in the name is that the instructions generated are not atomic strictly speaking, and I'd rather avoid the confusion with the really atomic instructions. But that's not a strong opinion, and I'm fine if we go with it anyway. If these are to be provided as macros, then I also suggest to ensure that they provide compiler-level barrier. You mean __insn_barrier(), right? Surrounding the access (one before, one after) also, right?
Re: __{read,write}_once
Alright so let's add the macros with volatile (my initial patch). Which name do we use? I actually like __{read,write}_racy().
Re: __{read,write}_once
Le 11/11/2019 à 13:51, Joerg Sonnenberger a écrit : On Mon, Nov 11, 2019 at 11:02:47AM +0100, Maxime Villard wrote: Typically in sys/uvm/uvm_fault.c there are several lockless stat increments like the following: /* Increment the counters.*/ uvmexp.fltanget++; Wasn't the general consensus here to ideally have per-cpu counters here that are aggregated passively? In this specific case it's not going to work, because uvmexp is supposed to be read from a core dump, and the stats are expected to be found in the symbol... I can think of three different options depending the platform: (1) Use full atomic operations. Fine for true UP platforms and when the overhead of per-CPU precise accounting is too high. (2) Use uninterruptible memory operations in per CPU memory, aggregate passively on demand. (3) Emulate uninterruptible memory operations with kernel RAS, aggregate passively on demand. Generally I would do (2), but in this specific case I suggest (1). atomic_inc_uint is supposed to be implemented on each platform already, so it's easy to do. (3) is a big headache for a very small reason, and it's not going to prevent inter-CPU races.
Re: __{read,write}_once
Le 08/11/2019 à 13:40, Mindaugas Rasiukevicius a écrit : Maxime Villard wrote: They are "atomic" in a sense that they prevent from tearing, fusing and invented loads/stores. Terms and conditions apply (e.g. they assume properly aligned and word-sized accesses). Additionally, READ_ONCE() provides a data-dependency barrier, applicable only to DEC Alpha. I think it was the right decision on the Linux side (even though trying to get all the data-dependency barriers right these days is kind of a lost cause, IMO). So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 atomic load/stores routines with memory_order_relaxed (or memory_order_consume, if you care about DEC Alpha). But... Didn't Marco just say that 'volatile' accesses do not actually prevent tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only do volatile accesses. Let me try to clarify: - The main purpose of READ_ONCE()/WRITE_ONCE() is to provide a way to perform atomic loads/stores (in a sense of preventing from the said behaviours), even though they help to get the memory ordering right too. Currently, 'volatile' is a key instrument in achieving that. However, as stated before, terms and conditions apply: 'volatile' just in itself does not provide the guarantee; the loads/stores also have to be properly aligned and word-sized (these are the pre-C11 assumptions we always had). Note: C11 introduces atomic _types_, so that the compiler could leverage the type system and thus provide the necessary guarantees. - Having re-read Marco's emails in this thread, I think we are very much in agreement. I think he merely points out that 'volatile' in itself is not sufficient; it does not mean it's not necessary. - There is quite a bit of confusion regarding 'volatile' amongst the developers. This is partly because 'volatile' is arguably underspecified in the C standard. AFAIK, some people in the C standardization committee have a view that it provides weaker guarantees; however, others maintain that the intent has always been clear and the wording is sufficient. Without going into the details (somewhat philosophical anyway), at least for now, 'volatile' is a de facto ingredient (one of a few, but absolutely necessary) in achieving atomic loads/stores. Sorry if this is a bit repetitive, but I hope it gets the point across. Alright, thanks. Do you think there is a performance degradation with using explicitly atomic operations (with the "lock" prefix on x86), compared to just using an aligned volatile which may not be exactly atomic in that sense (even if it happens to be on x86)? Typically in sys/uvm/uvm_fault.c there are several lockless stat increments like the following: /* Increment the counters.*/ uvmexp.fltanget++; In your (not just rmind@, but tech-kern@) opinion, is it better to switch to: atomic_inc_uint(&uvmexp.fltanget); or to __add_once(uvmexp.fltanget, 1); as I did in my patch? Considering that the latter may possibly not be atomic on certain arches, which could possibly result in garbage when the value is read. To fix that, do you agree that I should - Remove the first branch (because no lockless fastpath possible) - Move down the second branch (KASSERT) right after the mutex_enter ? Feel free to write a patch and I'll have a look at it once I have a little bit more free time. I've committed it right away, because this one seemed clear enough, feel free to comment or improve if you have a better idea: https://mail-index.netbsd.org/source-changes/2019/11/11/msg110726.html Maxime
Re: __{read,write}_once
Le 06/11/2019 à 23:41, Mindaugas Rasiukevicius a écrit : Maxime Villard wrote: - If we do not want to stick with the C11 API (its emulation), then I would suggest to use the similar terminology, e.g. atomic_load_relaxed() and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE(). There is no reason invent new terminology; it might just add more confusion. But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that correct? So there is a significant semantic difference. They are "atomic" in a sense that they prevent from tearing, fusing and invented loads/stores. Terms and conditions apply (e.g. they assume properly aligned and word-sized accesses). Additionally, READ_ONCE() provides a data-dependency barrier, applicable only to DEC Alpha. I think it was the right decision on the Linux side (even though trying to get all the data-dependency barriers right these days is kind of a lost cause, IMO). So, READ_ONCE()/WRITE_ONCE() is more or less equivalent to the C11 atomic load/stores routines with memory_order_relaxed (or memory_order_consume, if you care about DEC Alpha). But... Didn't Marco just say that 'volatile' accesses do not actually prevent tearing/fusing/invented loads/stores? READ_ONCE/WRITE_ONCE only do volatile accesses. Also, in my original patch I marked two branches of subr_xcall.c, but it seems to me now they are actually races: ie xc_wait(), the read of 'xc_donep' could be made by two 4-byte fetches on 32bit arches (at least). Between the two, an xc thread could have updated the value. So there is an actual race which could possibly result in returning early while we shouldn't have. Does that look correct to you? Correct. Alright, so we have the first bug found by KCSAN :) To fix that, do you agree that I should - Remove the first branch (because no lockless fastpath possible) - Move down the second branch (KASSERT) right after the mutex_enter ? There is more code in the NetBSD kernel which needs fixing. I would say pretty much all lock-free code should be audited. I believe KCSAN can greatly help with that, since it automatically reports concurrent accesses. Up to us then to switch to atomic, or other kinds of markers like READ_ONCE.
Re: __{read,write}_once
Le 06/11/2019 à 20:38, Mindaugas Rasiukevicius a écrit : Maxime Villard wrote: There are cases in the kernel where we read/write global memory locklessly, and accept the races either because it is part of the design (eg low-level scheduling) or we simply don't care (eg global stats). In these cases, we want to access the memory only once, and need to ensure the compiler does not split that access in several pieces, some of which may be changed by concurrent accesses. There is a Linux article [1] about this, and also [2]. I'd like to introduce the following macros: <...> A few comments for everybody here: - There is a general need in the NetBSD kernel for atomic load/store semantics. This is because plain accesses (loads/stores) are subject to _tearing_, _fusing_ and _invented_ loads/stores. I created a new thread which might help to clarify these and various other aspects: http://mail-index.netbsd.org/tech-kern/2019/11/06/msg025664.html Thanks - In C11, this can be handled with atomic_{load,store}_explicit() and memory_order_relaxed (or stronger memory order). - If we do not want to stick with the C11 API (its emulation), then I would suggest to use the similar terminology, e.g. atomic_load_relaxed() and atomic_store_relaxed(), or Linux READ_ONCE()/WRITE_ONCE(). There is no reason invent new terminology; it might just add more confusion. But... Linux's READ_ONCE/WRITE_ONCE are not actually atomic, is that correct? So there is a significant semantic difference. Also, in my original patch I marked two branches of subr_xcall.c, but it seems to me now they are actually races: ie xc_wait(), the read of 'xc_donep' could be made by two 4-byte fetches on 32bit arches (at least). Between the two, an xc thread could have updated the value. So there is an actual race which could possibly result in returning early while we shouldn't have. Does that look correct to you? Thanks
Re: __{read,write}_once
Le 06/11/2019 à 17:37, Marco Elver a écrit : On Wed, 6 Nov 2019 at 16:51, Kamil Rytarowski wrote: On 06.11.2019 16:44, Kamil Rytarowski wrote: On 06.11.2019 15:57, Jason Thorpe wrote: On Nov 6, 2019, at 5:41 AM, Kamil Rytarowski wrote: On 06.11.2019 14:37, Jason Thorpe wrote: On Nov 6, 2019, at 4:45 AM, Kamil Rytarowski wrote: I propose __write_relaxed() / __read_relaxed(). ...except that seems to imply the opposite of what these do. -- thorpej Rationale? This matches atomic_load_relaxed() / atomic_write_relaxed(), but we do not deal with atomics here. Fair enough. To me, the names suggest "compiler is allowed to apply relaxed constraints and tear the access if it wants" But apparently the common meaning is "relax, bro, I know what I'm doing". If that's the case, I can roll with it. See below. -- thorpej Unless I mean something this is exactly about relaxed constraints. miss* "Relaxed operation: there are no synchronization or ordering constraints imposed on other reads or writes" and without "operation's atomicity is guaranteed". In the memory consistency model world, "relaxed" has a very specific meaning for loads/stores. It simply means that the compiler and architecture is free to reorder the memory operation with respect to previous or following memory operations in program order. But the load/store still happens as one atomic operation. For programming-language memory models, "relaxed" appears in the context of atomic memory operations, to define their ordering w.r.t. other operations in program order. There is a distinction between "relaxed atomic" and plain (unannotated) memory operations at the programming language level. For plain operations, the compiler, in addition to the target architecture, is free to reorder operations or even apply optimizations that turn single plain operations into multiple loads/stores [1, 2]. [1] https://lwn.net/Articles/793253/ [2] https://lwn.net/Articles/799218/ In the Linux kernel READ_ONCE/WRITE_ONCE are one way to specify "relaxed atomic read/write" for accesses that fit in a word (_ONCE also works on things larger than a word, but probably aren't atomic anymore). For most architectures the implementation is the same, and merely avoids compiler optimizations; the exception is Alpha, where READ_ONCE adds a memory barrier [3]. [3] https://www.kernel.org/doc/Documentation/memory-barriers.txt This is also similar to what suggested Google to apply to NetBSD in our internal thread, but with a bit different naming. What you call the ops is entirely up to you. I would not use '{read,write}_relaxed', since as you pointed out above, is probably more confusing. This is why I suggested 'atomic_{read,write}_relaxed'. If you do not like the 'atomic' in there, the next best thing is probably '{read,write}_once'. Just note that you're now defining your own memory consistency model. This is a complex topic that spans decades of work. The safest thing is to stick as closely to the C11/C++11 memory model as possible [4]. The Linux kernel also has a memory model [5], but is significantly more complex since it was defined after various primitives had been introduced. [4] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1479.htm [5] https://github.com/torvalds/linux/blob/master/tools/memory-model/Documentation/explanation.txt Best Wishes, -- Marco Thanks for the details. I must say I still have a doubt about the instructions actually emited by the compiler. Let's take READ_ONCE for example. What it does is basically just casting the pointer to volatile and doing the access. Two points of confusion: - Is 'volatile' a guarantee that the compiler will emit only one instruction to fetch the value? - Assuming there is only one instruction, strictly speaking the fetch is not guaranteed to be atomic, is that correct? Because the address may not be naturally aligned; or the cpu may not be x86 and thus may not provide automatic atomicity in such case. Thanks
Re: __{read,write}_once
Le 06/11/2019 à 12:38, Martin Husemann a écrit : On Wed, Nov 06, 2019 at 12:31:37PM +0100, Maxime Villard wrote: __read_once(x) __write_once(x, val) The names are really not helpfull for understanding what is supposed to happen here. I don't know if you have a better suggestion, but it's basically the naming convention that seems to have been used for several years already. Maybe we could have something more explicit, like __{read,write}_racy() or __{read,write}_parallel() or __{read,write}_concurrent() But the last one is too long I think.
__{read,write}_once
There are cases in the kernel where we read/write global memory locklessly, and accept the races either because it is part of the design (eg low-level scheduling) or we simply don't care (eg global stats). In these cases, we want to access the memory only once, and need to ensure the compiler does not split that access in several pieces, some of which may be changed by concurrent accesses. There is a Linux article [1] about this, and also [2]. I'd like to introduce the following macros: __read_once(x) __write_once(x, val) These macros: - Are supposed to ensure that reads/writes result in only one access. We basically just cast to volatile for now. - Serve as markers for KCSAN, to say "yes there is a race, but it's expected and you don't need to care". - Help understanding the code, ie the reader will easily see that the area can legitimately race. In short, the main role they play is markers, to inform the reader and sanitizers. I've made a patch [3], with several changes already to use these macros. Feel free to comment [1] https://lwn.net/Articles/508991/ [2] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE [3] https://m00nbsd.net/garbage/kcsan/access-once.diff
nvmm: group
Currently /dev/nvmm has these permissions c rw- --- --- root wheel Which means the user has to tune the permissions manually. I would like to introduce an "nvmm" group, and change the perms to c rw- r-- --- root nvmm This means that emulators will have to be setgid nvmm, or the caller must be part of nvmm. Also, I want to make nvmm_init() public. Until now it was invoked during the first NVMM call within libnvmm. From now on it's the emulators that must invoke it before any other NVMM function. The point is that the emulator now has the ability to manually set the proper permissions before invoking nvmm_init() and drop them after the call has completed. Here is a patch [1] that adds this group (are there more changes needed?), and makes nvmm_init public. Feel free to comment, I'll commit it soon. [1] https://m00nbsd.net/garbage/nvmm/nvmm-group.diff
Re: make mbuf clusters coherency aligned again
Le 17/10/2019 à 19:39, Tobias Nygren a écrit : Hi, Since there were some concerns raised that this problem shouldn't be fixed in the driver ... https://mail-index.netbsd.org/source-changes/2019/10/15/msg109984.html ... I'm going partially back it out and commit this change instead. To summarize why; having mbuf clusters not coherency aligned is generally unsafe because there is as far as I have been able to determine nothing that prevents the kernel from clobbering memory in lower portion of the cache line whilst the upper portion of the cache line is involved in a DMA transfer. Index: uipc_mbuf.c === RCS file: /cvsroot/src/sys/kern/uipc_mbuf.c,v retrieving revision 1.234 diff -p -u -r1.234 uipc_mbuf.c --- uipc_mbuf.c 28 Sep 2019 16:02:12 - 1.234 +++ uipc_mbuf.c 17 Oct 2019 17:32:30 - @@ -188,8 +188,8 @@ mbinit(void) NULL, IPL_VM, mb_ctor, NULL, NULL); KASSERT(mb_cache != NULL); - mcl_cache = pool_cache_init(mclbytes, 0, 0, 0, "mclpl", NULL, - IPL_VM, NULL, NULL, NULL); + mcl_cache = pool_cache_init(mclbytes, COHERENCY_UNIT, 0, 0, "mclpl", + NULL, IPL_VM, NULL, NULL, NULL); KASSERT(mcl_cache != NULL); pool_cache_set_drain_hook(mb_cache, mb_drain, NULL); Yes, looks good to me.
Re: alloca() in kernel code
Le 12/10/2019 à 02:01, Emmanuel Dreyfus a écrit : Hello I recently encountered a situation where I had to deal with variable length structure at a time where kernel dynamic allocator was not initialized. Using alloca() to have the data allocated on stack seemed attractive, but unfortunately kernel build infrastructure ban that: For good reasons. error: stack protector not protecting local variables: variable length buffer [-Werror=stack-protector] As a result, I have to rely on a fixed-size buffer big enough so that I can hope data will fit, and miserabily fail otherwise. Is it possible to relax the protection on some code section? No. The kernel stack is rather small, we should not start allocating variable-sized stuff on it. Or is there an alternative way of dealing with that? You didn't say what exactly you wanted to allocate memory for, so it's hard to tell. The rare legitimate cases where I've had to do that, I picked buffers big enough. In all cases, we should never allocate variable-sized buffers on the stack in the kernel.
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 21:21, Mouse a écrit : So why the hell are you throwing in statements like immense trouble for little actual use. Because this is true! Do you understand, or not? Pfff. It's not me you were responding to here. But you appear to be falling into a relatively common trap: operating from the assumption that you are entirely correct and that your underlying assumptions are universally true, and that therefore anyone who disagrees with you must not understand what you are saying. Actually, you are right. This is the N-th time that this problem comes up. For the N-th time, I see people slipping away, trying to find moronic justifi- cations like "maybe the users don't know they use compat_linux". Of course I get upset, I've heard this crap N-1 times before. And the bigger the N gets, the more I am confident that I'm right, yes. And the less I'm tempted to discuss, also. I am absolutely upset at this moronic topic that has to come up again, and again, and again, because people can't process one minute that there is a *RISK* in these compat features. It's been literally years of this total nonsense. And I've heard all of the most idiotic reasons: from the "uh no you are just paranoid", to the "but you have no proof it is buggy", then to the "oh no but every software has bugs", down to now "maybe the users don't know they use compat_linux". You know who's the dumbass who is cleaning all this shit up each time? Well, it's me. I am the one trying to fix the bugs, processing pullups, writing security advisories, and even handling the bad press. And I did that again two weeks ago when I set up a fuzzing VM to find bugs in compat_linux which has resulted in all the fixes that now need even more effort to be pulled up, documented, etc, because this buggy crap is enabled by default. People are here creating a f*cking mess, and I'm here systematically doing the damage control. The slighest proposal is met with the biggest nonsense. Complete lack of engineering spirit on things as simple as disabling risky stuff, while other OSes have done that for years without problem. Idiocy at its finest. That's it, I've said it. I understand what you are claiming. I think most of us do. I simply disagree with some of it and see you as having failed to present a good case for most of the rest. Repeating your unsubstantiated claims - such as the one that you readily admitted was unsubstantiated when I called you on it - is not going to convince anyone of anything, possibly excepting that you have such a weak point that you have to resort to appeals to emotion rather than actual technical arguments to support it. The "unsubstantiated" claim is the percentage which I immediately clarified to you as being illustrative but not definitive. The only ones who are making unsubstianted claims are the ones who keep talking nonsense; like John Nemeth for example, who as usual makes a great demonstration of his intelligence by showing up late in a conversation and answering to each email in backwards order on points that have been addressed already. You are not worth my time. And you're now resorting to...well, it's not a classic ad hominem, but it's pretty close. (It's not any individual that should be "worth [your] time" or not, but rather the arguments presented.) Your arguments are convincing, but what they are convincing me, for one, of is not what you appear to want to convince people of. Well it appears you too have convinced me of something you were not expecting: that you are not worth my time either. Trying to suggest I'm making unsubstianted claims when I'm not. Trying to say I use emotions while I put forward the technical reasons and solution. Quoting single sentences like the above without quoting the paragraph. Do you want to contribute and do all the actual dirty work for once, or you're just here to talk and give your random opinions on things you've never invested yourself in? What is your background? What is your portfolio? Verily, I'm glad this thread has reached this point. Because it has just cleared away the cobwebs in my mind. My current inability to discuss this kind of topic, and to work through the nonsense, irrelevance and profound idiocy in the answers I'm facing, just shows I've accumulated so much frustration and anger towards this project. I believe it is clear, and in this regard, you are not completely wrong. So here's what's going to happen. I will drop this thread, because in the end I don't care a lot. I will stop doing the work I've been doing, have withdrawn from secteam already, and won't be taking care of any of that from now on. I'm done, this is over. Good luck
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 18:56, Kamil Rytarowski a écrit : Statement that Linux-compat is not much used is false at least due to adobe-flash-plugin-11.2.202.644nb1. I find this package as a must have for a web browser. Uh? No, this is absolutely wrong. Flash player will be completely abandoned by Adobe in 2020. The vast majority of the internet has switch to equivalents. It has also been disabled by default by Firefox and other web browsers for years. Flash player is absolutely not a justification for compat_linux.
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 19:21, Edgar Fuß a écrit : I do understand you are not proposing to remove compat_linux. I do understand your proposal is just about disabling module auto-load (which I don't care about because I run a monolithic kernel). So why the hell are you throwing in statements like immense trouble for little actual use. Because this is true! Do you understand, or not? Pfff. Yes, compat_linux has had vulnerabilities, repeatedly, and yes, this has caused us immense trouble, like the situation right now, where we have things to pull up and advisories to write but no one willing to do that work. This has happened repeatedly, and I'm making a proposal to disable this feature, to prevent this trouble in the future. Nothing controversial about that, it just makes sense. You are not worth my time.
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 19:39, Martin Husemann a écrit : Btw: I find it very strange that this proposal has no concrete proposal. What compat options exactly do you want to disallow from auto loading? What exact sysctls and semantics do you have in mind? My proposal is the same as Taylor's, as I said in the initial email. However, maybe keep the NetBSD-specific compat enabled as far as compat_50 or compat_60. The semantic is the one I stated in the initial email: the use of kernel modules, now that this is doable in practice.
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 18:19, Kamil Rytarowski a écrit : On 26.09.2019 15:55, Kamil Rytarowski wrote: On 26.09.2019 15:06, Mouse wrote: [...] compat_linux and compat_linux32 [...] Keeping them enabled for the <1% users interested means keeping vulnerabilities for the >99% who don't use these features. Are the usage numbers really that extreme? Where'd you get them? I didn't think there were any mechanisms in place that would allow tracking compat usage. I depend on compat_linux/compat_linux32 in at least 4 applications. I don't use them daily, but frequently. There are use-cases where linux_compat* is to be or not to be for NetBSD as host. (At least one commercial user of NetBSD depends on it as well.) In general it is fine to disable linux_compat* unless we can ensure its correctness with regression tests and continuous fuzzing. As I discussed with one developer, we could test linux_compat with faked/light linux libc, unfortunately it still did not realize. I have managed to get light cross-toolchain producing Linux binaries. The only dependency is musl (+gmake to build it). It works almost out of the box (I had to manually build certain files). The only think that is required to be tuned is to add a dummy gnu_debuglink. https://nxr.netbsd.org/xref/src/sys/compat/linux/common/linux_exec_elf32.c#214 The kernel shall have a fallback, and probably parse .intrp string and look for 'ld-musl' in it. $ cat m.c #include int main(int argc, char **argv) { printf("Hello world!\n"); return 0; } $ musl-gcc m.c $ objcopy --add-gnu-debuglink=/dev/null ./a.out $ file a.out a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, not stripped $ ./a.out Hello world! Another step is to build LTP with musl-gcc and run the tests. https://github.com/linux-test-project/ltp (LTP supports musl) There will be some patches involved. The question is how to setup the tests for it? anita + picking these packages from pkgsrc? Making a setup with a syzbot instance for linux32/linux64 target should be not that difficult. Unfortunately all that needs some work in various projects, integration, upstreaming code (to LTP?) etc and maybe TNF could spare someone to do it. Also linux_compat is getting more and more irrelevant as time pass due to shortage in our futex code (lack of robust futexes). This is slowly worked on. I think you are confused. Fuzzing does not act as a perfusion to keep something artificially alive. Part of the compat_linux bugs and vulns I fixed recently were found with a simple fuzzing VM I set up. Yet, does it really change anything? My proposal is unchanged. The attack surface is huge, I've been able to exercise only a subset of the syscalls. Now big pullups need to be done, and no one is willing to do them. The scanner bot also has found bugs that couldn't be found with fuzzers. Disabling certain compat options preserves the functionality, significantly reduces the attack surface, and eases maintenance as well. At least the bugs do not qualify as critical vulnerabilities if the feature is disabled. Fuzzing does not reduce the attack surface, and does not constitute "active maintenance" either (like pulling up, writing advisories, etc...).
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 18:15, Manuel Bouyer a écrit : On Thu, Sep 26, 2019 at 05:28:11PM +0200, Maxime Villard wrote: Le 26/09/2019 à 17:25, Brian Buhrow a écrit : [...] One implication of your proposal is that you'll disable the autoload functionality, users will turn it back on, use it, and be more vulnerable than they are now because the primary developers aren't concern with making things work or secure anymore. Nobody is making compat_linux work, nobody is making compat_linux secure. My experience with a Linux program using forks and signals is that it does not work at all; the children get killed randomly for no reason, the parent doesn't receive the signals, and after some time everything needs to be killed and restarted to work again. Completely broken. I didn't manage to find where exactly the problem was. Under reasonable assumptions, compat_linux indeed used to be the most maintained compat layer we had. This isn't the case anymore. Under reasonable assumptions as well, it has a marginal use case, and can be disabled. Maybe Manuel can understand that for a minute? Or is he still looking for evidence that I'm not the Pope? Secure, I don't know. Well Manuel can pretend everything he wants, but when it comes to security and compat_linux, we're entering the world of facts, and not reasonable assumptions. compat_linux has a history of being vulnerable, the main reason being that it is is poorly tested - no ATF, no reliable way to exercise the code comprehensively -, and not actively maintained either. This has led to a series of exploitable vulnerabilities in the past, and a few weeks ago I found even more of them using a rather simplistic QA technique. I want to remind you, once again, that no one is talking about removing functionality. We are talking about disabling it by default. The functionality will still be there, and only one command away from being enabled. FreeBSD has been doing that for a while, and OpenBSD did too before removing their version of compat_linux. It is simple, and it makes sense, and I don't understand the fuss and the nonsence that Manuel is trying to throw here.
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 16:47, Manuel Bouyer a écrit : On Thu, Sep 26, 2019 at 04:40:33PM +0200, Maxime Villard wrote: Actually this is not clear. We have linux binaries in pkgsrc. ... And? We have 22000 packages in pkgsrc. How is it relevant ? I install less than 200. But there is suse_base in them The real question, is how the fact that there are linux binaries in pkgsrc relevant. Yes, there are packages. And? Most of them are completely outdated and haven't been updated in the last 10 years.
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 17:25, Brian Buhrow a écrit : [...] One implication of your proposal is that you'll disable the autoload functionality, users will turn it back on, use it, and be more vulnerable than they are now because the primary developers aren't concern with making things work or secure anymore. Nobody is making compat_linux work, nobody is making compat_linux secure.
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 17:15, Manuel Bouyer a écrit : On Thu, Sep 26, 2019 at 05:10:01PM +0200, Maxime Villard wrote: issues for a clearly marginal use case, and given the current general ^^^ This is where we dissagree. You guess it's marginal but there's no evidence of that (and there's no evidence of the opposite either). Can you provide evidence that it is used by the majority of the users? And that therefore keeping vulnerabilities for 100% of the people is legitimate? Please provide clear evidence.
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 17:03, Manuel Bouyer a écrit : On Thu, Sep 26, 2019 at 04:52:35PM +0200, Maxime Villard wrote: Le 26/09/2019 à 16:47, Manuel Bouyer a écrit : On Thu, Sep 26, 2019 at 04:40:33PM +0200, Maxime Villard wrote: Actually this is not clear. We have linux binaries in pkgsrc. ... And? We have 22000 packages in pkgsrc. How is it relevant ? I install less than 200. But there is suse_base in them The real question, is how the fact that there are linux binaries in pkgsrc relevant. Yes, there are packages. And? Most of them are it's relevant because they are in the binary repositories, so anyone can install them with pkg_add or pkgin. And they don't even have to know these are linux binaries. completely outdated and haven't been updated in the last 10 years. They were usefull 10 years ago; it doen't make then useless now. For example I use then to run eagle, or microchip compilers, on a regular basis. Have you at least read my initial email? I am not saying compat_linux is useless, I am saying that it has security issues for a clearly marginal use case, and given the current general unwillingness to maintain it and handle the security issues in it, it is a good candidate for being disabled (!= removed). Same as it was in 2017 when Taylor started the first thread. Can we stop this moronic conversation and get back to the point? Thanks. Disabling it doesn't mean removing it, and "modload compat_linux" will restitute the functionality entirely while perserving against the vulnerabilities by default.
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 16:36, Manuel Bouyer a écrit : On Thu, Sep 26, 2019 at 04:29:52PM +0200, Maxime Villard wrote: Le 26/09/2019 à 16:22, Mouse a écrit : Keeping them enabled for the <1% users interested means keeping vulnerabilities for the >99% who don't use these features. Are the usage numbers really that extreme? Where'd you get them? I didn't think there were any mechanisms in place that would allow tracking compat usage. No, there is no strict procedure to monitor compat usage, and there never will be. Maybe it's not <1%, but rather 1.5%; or maybe it's 5%, 10%, 15%. Who cares, exactly? The short answer is "anyone who wants NetBSD to be useful". If it really is only a tiny fraction - under ten people, say - then, sure, yank it out. If it's 90%, removing it would lose most of the userbase, possibly provoke a fork. 15%, 40%, I don't think there is a hard line between "pull it" and "keep it", and even if there were I'm not sure it would matter because it appears nobody knows what the actual use rate is anyway. What is known, however, is that 100% of the users are affected by the vulnerabilities. So, do we keep these things enabled by default just because "uh we don't know so we shouldn't do anything"? Even as it's already been clear that the majority doesn't use compat_linux? Actually this is not clear. We have linux binaries in pkgsrc. ... And? We have 22000 packages in pkgsrc. Is it such a Herculean effort to type "modload compat_linux" for the people that want to use Linux binaries? In order to keep the majority safe from the bugs and vulnerabilities? Maybe some of them don't even know they are using compat_linux ... Yeah, and maybe I'm the Pope also, who knows.
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 16:22, Mouse a écrit : Keeping them enabled for the <1% users interested means keeping vulnerabilities for the >99% who don't use these features. Are the usage numbers really that extreme? Where'd you get them? I didn't think there were any mechanisms in place that would allow tracking compat usage. No, there is no strict procedure to monitor compat usage, and there never will be. Maybe it's not <1%, but rather 1.5%; or maybe it's 5%, 10%, 15%. Who cares, exactly? The short answer is "anyone who wants NetBSD to be useful". If it really is only a tiny fraction - under ten people, say - then, sure, yank it out. If it's 90%, removing it would lose most of the userbase, possibly provoke a fork. 15%, 40%, I don't think there is a hard line between "pull it" and "keep it", and even if there were I'm not sure it would matter because it appears nobody knows what the actual use rate is anyway. What is known, however, is that 100% of the users are affected by the vulnerabilities. So, do we keep these things enabled by default just because "uh we don't know so we shouldn't do anything"? Even as it's already been clear that the majority doesn't use compat_linux? Is it such a Herculean effort to type "modload compat_linux" for the people that want to use Linux binaries? In order to keep the majority safe from the bugs and vulnerabilities?
Re: Proposal, again: Disable autoload of compat_xyz modules
Le 26/09/2019 à 15:06, Mouse a écrit : [...] compat_linux and compat_linux32 [...] Keeping them enabled for the <1% users interested means keeping vulnerabilities for the >99% who don't use these features. Are the usage numbers really that extreme? Where'd you get them? I didn't think there were any mechanisms in place that would allow tracking compat usage. No, there is no strict procedure to monitor compat usage, and there never will be. Maybe it's not <1%, but rather 1.5%; or maybe it's 5%, 10%, 15%. Who cares, exactly? This compat topic has been discussed over and over, and the conclusion is systematically that these compat options cause immense trouble for little actual use. Now, we are again discussing the matter, because the problem is still there.
Proposal, again: Disable autoload of compat_xyz modules
I recently made a big set of changes to fix many bugs and vulnerabilities in compat_linux and compat_linux32, the majority of which have a security impact bigger than the Intel CPU bugs we hear about so much. These compat layers are enabled by default, so everybody is affected. Secteam is in a state where no one is willing to pull up all the changes to the stable branches, because of the effort. No one is willing to write a security advisory either. When I say "no one", it includes me. The proposal and discussion held in this 2017 thread still hold and are unchanged two years later: https://mail-index.netbsd.org/tech-kern/2017/07/31/msg022153.html The compat layers are largely untested, often broken, and are a security risk for everybody. Keeping them enabled for the <1% users interested means keeping vulnerabilities for the >99% who don't use these features. In the conversation above, we hit the problem that there was cross-dependency among compat modules, and we couldn't selectively disable specific layers. Today this is possible thanks to pgoyette's work. That is, it is possible to comment out "options COMPAT_LINUX" from GENERIC, and have a compat_linux.kmod which will modload correctly and be able to run Linux binaries out of the box. Under this scheme, the feature would be only one root command away from being enabled in the kernel. Therefore, I am making today the same proposal as Taylor in 2017, because the problem is still there exactly as-is and we just hit it again; the solution however is more straightforward.
Re: kmem_alloc() vs. VM_MIN_KERNEL_ADDRESS
Le 02/07/2019 à 16:21, J. Hannken-Illjes a écrit : It would work but unfortunately dtrace is a module and cannot use PMAP_DIRECT_BASE aka. pmap_direct_base. These variables are not visible from module context and don't exist if the kernel is built with KASAN. Not related to the issue (which has been fixed), but: while the system will remain functional if you load a GENERIC kmod on a KASAN kernel, the KASAN detection you get when executing the kmod is only partial. Stack overflows in the kmod, for example, won't be detected. If you want the detection, you need to pass the KASANFLAGS content (different on x86_64 and aarch64) as CFLAGS when building the kmod, plus -DKASAN. I think I will add "no options MODULAR" with KASAN, so that there's no confusion. There aren't many reasons for using modules under KASAN anyway, and the rule when it comes to kernel compiler instrumentation is that everything should be monolitic.
Re: pool guard
Le 12/09/2019 à 08:21, Maxime Villard a écrit : Le 06/09/2019 à 15:09, Maxime Villard a écrit : An idea for a feature similar to KMEM_GUARD - which I recently removed because it was too weak and useless -, but this time at the pool layer, covering certain specific pools, without memory consumption or performance cost, and enabled by default at least on amd64. Note that this is hardening and exploit mitigation, but not bug detection, so it will be of little interest in the context of fuzzing. Note also that it targets 64bit arches, because they have nearly unlimited VA. The idea is that we can use special guard allocators on certain pools to prevent important kernel data from being close to untrusted data in the VA space. Suppose the kernel is parsing a received packet from the network, and there is a buffer overflow which causes it to write beyond the mbuf. The data is in an mbuf cluster of size 2K (on amd64). This mbuf cluster sits on a 4K page allocated using the default pool allocator. After the 4K page in memory, there could be critical kernel data sitting, which an attacker could overwrite. overflow ---> +++--+ | 2K Cluster | 2K Cluster | Critical Kernel Data | +++--+ <- usual 4K pool page --> <- another 4K page --> This is a scenario that I already encountered when working on NetBSD's network stack. Now, we switch the mcl pool to use the new uvm_km_guard API (simple wrappers to allocate buffers with unmapped pages at the beginning and the end). The pool layer sees pages of size 128K, and packs 64 2K clusters in them. overflow ~> ++---+---+---+---+---++ | Unmapped | 2K C. | 2K C. | [...] | 2K C. | 2K C. | Unmapped | ++---+---+---+---+---++ <-- 64K ---> <-- 128K pool page with 64 clusters --> <-- 64K ---> The pool page header is off-page, and bitmapped. Therefore, there is strictly no kernel data in the 128K pool page. The overflow still occurs, but this time the critical kernel data is far from here, after the unmapped pages at the end. At worst only other clusters get overwritten; at best we are close to the end and hit a page fault which stops the overflow. 64K is chosen as the maximum of uint16_t. No performance cost, because these guarded buffers are allocated only when the pools grow, which is a rare operation that occurs almost only at boot time. No actual memory consumption either, because unmapped areas don't consume physical memory, only virtual, and on 64bit arches we have plenty of that - eg 32TB on amd64, far beyond what we will ever need -, so no problem with consuming VA. The code is here [1] for mcl, it is simple and works fine. It is not perfect but can already prevent a lot of trouble. The principle could be applied to other pools. [1] https://m00nbsd.net/garbage/pool/guard.diff If there are no further comments, I will commit it within a week. Actually I realized there is a problem. uvm_km_guard must use uvm_map and not the kmem arena, because the kmem arena is proportionate to the PA. The thing is, using uvm_map is illegal, because we're sometimes in interrupt context. Needs to be revisited.
Re: pool guard
Le 06/09/2019 à 15:09, Maxime Villard a écrit : An idea for a feature similar to KMEM_GUARD - which I recently removed because it was too weak and useless -, but this time at the pool layer, covering certain specific pools, without memory consumption or performance cost, and enabled by default at least on amd64. Note that this is hardening and exploit mitigation, but not bug detection, so it will be of little interest in the context of fuzzing. Note also that it targets 64bit arches, because they have nearly unlimited VA. The idea is that we can use special guard allocators on certain pools to prevent important kernel data from being close to untrusted data in the VA space. Suppose the kernel is parsing a received packet from the network, and there is a buffer overflow which causes it to write beyond the mbuf. The data is in an mbuf cluster of size 2K (on amd64). This mbuf cluster sits on a 4K page allocated using the default pool allocator. After the 4K page in memory, there could be critical kernel data sitting, which an attacker could overwrite. overflow ---> +++--+ | 2K Cluster | 2K Cluster | Critical Kernel Data | +++--+ <- usual 4K pool page --> <- another 4K page --> This is a scenario that I already encountered when working on NetBSD's network stack. Now, we switch the mcl pool to use the new uvm_km_guard API (simple wrappers to allocate buffers with unmapped pages at the beginning and the end). The pool layer sees pages of size 128K, and packs 64 2K clusters in them. overflow ~> ++---+---+---+---+---++ | Unmapped | 2K C. | 2K C. | [...] | 2K C. | 2K C. | Unmapped | ++---+---+---+---+---++ <-- 64K ---> <-- 128K pool page with 64 clusters --> <-- 64K ---> The pool page header is off-page, and bitmapped. Therefore, there is strictly no kernel data in the 128K pool page. The overflow still occurs, but this time the critical kernel data is far from here, after the unmapped pages at the end. At worst only other clusters get overwritten; at best we are close to the end and hit a page fault which stops the overflow. 64K is chosen as the maximum of uint16_t. No performance cost, because these guarded buffers are allocated only when the pools grow, which is a rare operation that occurs almost only at boot time. No actual memory consumption either, because unmapped areas don't consume physical memory, only virtual, and on 64bit arches we have plenty of that - eg 32TB on amd64, far beyond what we will ever need -, so no problem with consuming VA. The code is here [1] for mcl, it is simple and works fine. It is not perfect but can already prevent a lot of trouble. The principle could be applied to other pools. [1] https://m00nbsd.net/garbage/pool/guard.diff If there are no further comments, I will commit it within a week.
Re: pool guard
Le 08/09/2019 à 08:54, Jason Thorpe a écrit : On Sep 8, 2019, at 9:29 AM, Maxime Villard wrote: Don't confuse VA and PA. NetBSD-amd64 supports 16TB of PA, so even if you have 48TB nvdimms it gets truncated to 16TB. Then, we have 32TB of VA, twice more than the maximum PA. So again, we are fine. Yes, but obviously we should fix that at some point. "Fixing" this entails first having a UVM that can scale up to and work reasonably well with such gigantic amounts of RAM, which is far from being the case currently. We will need to have SMP, NUMA, large pages, and in short, we will likely have to rewrite UVM entirely. By the time we have done all of that, the 64bit CPUs will already ship with 5-level page trees which multiply by 512 the size of the VA space, so again, we will be largely fine. Even otherwise, revisiting pool guards to increase the mapped/unmapped ratio is a very insignificant and easy task. I see no problem related to VA/PA amounts.
Re: pool guard
Le 07/09/2019 à 23:47, matthew green a écrit : No performance cost, because these guarded buffers are allocated only when the pools grow, which is a rare operation that occurs almost only at boot time. No actual memory consumption either, because unmapped areas don't consume physical memory, only virtual, and on 64bit arches we have plenty of that - eg 32TB on amd64, far beyond what we will ever need -, so no problem with consuming VA. i like this idea, but i would like to point out that HPE already sell a machine with 48TiB ram and nvdimms are going to explode the apparently memory in the coming years, so "32TiB" is very far from "plenty". we have many challenges to get beyond 8TiB tho, since we count pages in 'int' all over uvm. Don't confuse VA and PA. NetBSD-amd64 supports 16TB of PA, so even if you have 48TB nvdimms it gets truncated to 16TB. Then, we have 32TB of VA, twice more than the maximum PA. So again, we are fine.
pool guard
An idea for a feature similar to KMEM_GUARD - which I recently removed because it was too weak and useless -, but this time at the pool layer, covering certain specific pools, without memory consumption or performance cost, and enabled by default at least on amd64. Note that this is hardening and exploit mitigation, but not bug detection, so it will be of little interest in the context of fuzzing. Note also that it targets 64bit arches, because they have nearly unlimited VA. The idea is that we can use special guard allocators on certain pools to prevent important kernel data from being close to untrusted data in the VA space. Suppose the kernel is parsing a received packet from the network, and there is a buffer overflow which causes it to write beyond the mbuf. The data is in an mbuf cluster of size 2K (on amd64). This mbuf cluster sits on a 4K page allocated using the default pool allocator. After the 4K page in memory, there could be critical kernel data sitting, which an attacker could overwrite. overflow ---> +++--+ | 2K Cluster | 2K Cluster | Critical Kernel Data | +++--+ <- usual 4K pool page --> <- another 4K page --> This is a scenario that I already encountered when working on NetBSD's network stack. Now, we switch the mcl pool to use the new uvm_km_guard API (simple wrappers to allocate buffers with unmapped pages at the beginning and the end). The pool layer sees pages of size 128K, and packs 64 2K clusters in them. overflow ~> ++---+---+---+---+---++ | Unmapped | 2K C. | 2K C. | [...] | 2K C. | 2K C. | Unmapped | ++---+---+---+---+---++ <-- 64K ---> <-- 128K pool page with 64 clusters --> <-- 64K ---> The pool page header is off-page, and bitmapped. Therefore, there is strictly no kernel data in the 128K pool page. The overflow still occurs, but this time the critical kernel data is far from here, after the unmapped pages at the end. At worst only other clusters get overwritten; at best we are close to the end and hit a page fault which stops the overflow. 64K is chosen as the maximum of uint16_t. No performance cost, because these guarded buffers are allocated only when the pools grow, which is a rare operation that occurs almost only at boot time. No actual memory consumption either, because unmapped areas don't consume physical memory, only virtual, and on 64bit arches we have plenty of that - eg 32TB on amd64, far beyond what we will ever need -, so no problem with consuming VA. The code is here [1] for mcl, it is simple and works fine. It is not perfect but can already prevent a lot of trouble. The principle could be applied to other pools. [1] https://m00nbsd.net/garbage/pool/guard.diff
Re: re-enabling debugging of 32 bit processes with 64 bit debugger
https://www.netbsd.org/~christos/ptrace32.diff looks good to me, however the function should be called something like cpu_mcontext32from64_validate, to make it clear that it's just cpu_mcontext32_validate with a 64bit structure and in process_write_regs(), add {}s in the else too
Re: [PATCH v2] Include XSTATE note in x86 core dumps
Le 05/07/2019 à 17:22, Michał Górny a écrit : +#ifdef EXEC_ELF32 +#include +#endif +#define PT64_GETXSTATE PT_GETXSTATE +#define COREDUMP_MACHDEP_LWP_NOTES \ +{ \ + struct xstate xstate; \ memset needed, otherwise we leak stuff + error = process_read_xstate(l, &xstate);\ + if (error) \ + return (error); \ + ELFNAMEEND(coredump_savenote)(ns, \ + CONCAT(CONCAT(PT, ELFSIZE), _GETXSTATE), name, &xstate, \ + sizeof(xstate));\ +}
Re: kmem_alloc() vs. VM_MIN_KERNEL_ADDRESS
Le 02/07/2019 à 16:21, J. Hannken-Illjes a écrit : On 1. Jul 2019, at 20:12, Maxime Villard wrote: Le 01/07/2019 à 19:40, J. Hannken-Illjes a écrit : Sometimes kmem_alloc() returns an address below VM_MIN_KERNEL_ADDRESS, something like kmem_alloc() -> 0xaba25800a5a8 with VM_MIN_KERNEL_ADDRESS 0xad80 It doesn't happen after every reboot and breaks dtrace which treats the region 0.. VM_MIN_KERNEL_ADDRESS as toxic. How could we test against the lowest address the kernel may use? -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig Mmh, this seems to be a side effect of 1/2-KASLR which is enabled by default in GENERIC. Reminder: in GENERIC, we now randomize by default every kernel memory region. It is now possible that the direct map is below the main memory map -- it was not the case before. kmem_alloc uses the direct map for certain allocations. First, please confirm by also dumping PMAP_DIRECT_BASE and PMAP_DIRECT_END to make sure the area returned by kmem_alloc is indeed in the dmap when dtrace doesn't work. Then, it seems that dtrace should be taught to consider as toxic everything that is neither in the main map nor in the dmap. That is, only these ranges are valid: VM_MIN_KERNEL_ADDRESS -> VM_MAX_KERNEL_ADDRESS PMAP_DIRECT_BASE -> PMAP_DIRECT_END There is no actual "lowest" address the kernel may use; there are just two separate regions that are valid. (Plus the kernel image itself, if it matters.) It would work but unfortunately dtrace is a module and cannot use PMAP_DIRECT_BASE aka. pmap_direct_base. These variables are not visible from module context and don't exist if the kernel is built with KASAN. Ok. Solaris uses the toxic areas 0 .. _userlimit and all regions mapped from devices. Will the kernel ever map anything below VM_MIN_KERNEL_ADDRESS_DEFAULT? No. Do we have a mapping that contains all device-mapped memory? No, it's either the main map or the direct map.
Re: kmem_alloc() vs. VM_MIN_KERNEL_ADDRESS
Le 01/07/2019 à 19:40, J. Hannken-Illjes a écrit : Sometimes kmem_alloc() returns an address below VM_MIN_KERNEL_ADDRESS, something like kmem_alloc() -> 0xaba25800a5a8 with VM_MIN_KERNEL_ADDRESS 0xad80 It doesn't happen after every reboot and breaks dtrace which treats the region 0.. VM_MIN_KERNEL_ADDRESS as toxic. How could we test against the lowest address the kernel may use? -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig Mmh, this seems to be a side effect of 1/2-KASLR which is enabled by default in GENERIC. Reminder: in GENERIC, we now randomize by default every kernel memory region. It is now possible that the direct map is below the main memory map -- it was not the case before. kmem_alloc uses the direct map for certain allocations. First, please confirm by also dumping PMAP_DIRECT_BASE and PMAP_DIRECT_END to make sure the area returned by kmem_alloc is indeed in the dmap when dtrace doesn't work. Then, it seems that dtrace should be taught to consider as toxic everything that is neither in the main map nor in the dmap. That is, only these ranges are valid: VM_MIN_KERNEL_ADDRESS -> VM_MAX_KERNEL_ADDRESS PMAP_DIRECT_BASE -> PMAP_DIRECT_END There is no actual "lowest" address the kernel may use; there are just two separate regions that are valid. (Plus the kernel image itself, if it matters.)
Re: nvmm: change the api
Le 05/06/2019 à 21:26, Kamil Rytarowski a écrit : On 05.06.2019 08:08, Maxime Villard wrote: I intend to change the NVMM API in order to reduce the data movements. The patches are available here [1]. This looks good and it will make the interfaces simpler. Could we merge the demo example into /usr/share/examples/nvmm ? https://www.netbsd.org/~maxv/nvmm/nvmm-demo.zip It will be easier and quicker to navigate it on disk easier whenever needed. I didn't feel like consuming disk space
Re: [PATCH v3 2/2] Implement PT_GETXSTATE and PT_SETXSTATE
Le 22/06/2019 à 09:38, Michał Górny a écrit : Hi, Attached is the next version of unified patch. I've updated the boolean logic as requested, and fixed indentations. Looks good to me. Please don't use ()s when not needed, like - memset(&(xstate->xs_fxsave), 0, sizeof(xstate->xs_fxsave)); - process_s87_to_xmm(&fpu_save->sv_87, &(xstate->xs_fxsave)); + memset(&xstate->xs_fxsave, 0, sizeof(xstate->xs_fxsave)); + process_s87_to_xmm(&fpu_save->sv_87, &xstate->xs_fxsave); but whatever, this is detail
Re: [PATCH v3 2/2] Implement PT_GETXSTATE and PT_SETXSTATE
Le 22/06/2019 à 07:42, Michał Górny a écrit : On Sat, 2019-06-22 at 07:35 +0200, Maxime Villard wrote: Le 19/06/2019 à 14:20, Michał Górny a écrit :> On Wed, 2019-06-19 at 06:32 +0200, Maxime Villard wrote: Can you provide a unified patch so I can review quickly? Sorry for the delay Here you are. Note that the manpage is not updated yet, as I'm waiting for your comments on the xs_rfbm approach. + memcpy(&(xstate -> field), \ + (char*)fpu_save + x86_xsave_offsets[xsave_val], \ + sizeof(xstate -> field));\ Please use proper KNF here and in other places Are you talking of indentation? It would be really helpful to be more specific here. Yes memcpy(&xstate->field, \ (char*)fpu_save + x86_xsave_offsets[xsave_val], \ sizeof(xstate->field)); \ But that's just cosmetic + fpu_save->sv_xsave_hdr.xsh_xstate_bv = + (~xstate->xs_rfbm | xstate->xs_xstate_bv) & + (xstate->xs_rfbm | fpu_save->sv_xsave_hdr.xsh_xstate_bv); Can't we just do this? Seems simpler fpu_save->sv_xsave_hdr.xsh_xstate_bv = (fpu_save->sv_xsave_hdr.xsh_xstate_bv & ~xstate->xs_rfbm) | xstate->xs_xstate_bv; Unless I'm mistaken, this would prevent the user from being able to force xs_xstate_bv to 0 if it was set previously, and therefore clear registers to 'unmodified' state. If the user gives RFBM=all and XSTATE_BV=0, it does force the result to zero Unfortunately, in this code you will likely hit a race; between the moment you set fpu_save->sv_xsave_hdr.xsh_xstate_bv and the moment you reach COPY_COMPONENT, the thread could have been context-switched, and xstate_bv could have been clobbered. Unless I'm mistaken, this code should only be executed on stopped processes. Or at least that was the idea Kamil suggested to me. Mmh yes indeed, I even said it earlier in the thread; forget it, I was trying to remember where I had seen races in the FPU code, and it's not a problem here
Re: [PATCH v3 2/2] Implement PT_GETXSTATE and PT_SETXSTATE
Le 19/06/2019 à 14:20, Michał Górny a écrit :> On Wed, 2019-06-19 at 06:32 +0200, Maxime Villard wrote: Can you provide a unified patch so I can review quickly? Sorry for the delay Here you are. Note that the manpage is not updated yet, as I'm waiting for your comments on the xs_rfbm approach. + memcpy(&(xstate -> field), \ + (char*)fpu_save + x86_xsave_offsets[xsave_val], \ + sizeof(xstate -> field));\ Please use proper KNF here and in other places + fpu_save->sv_xsave_hdr.xsh_xstate_bv = + (~xstate->xs_rfbm | xstate->xs_xstate_bv) & + (xstate->xs_rfbm | fpu_save->sv_xsave_hdr.xsh_xstate_bv); Can't we just do this? Seems simpler fpu_save->sv_xsave_hdr.xsh_xstate_bv = (fpu_save->sv_xsave_hdr.xsh_xstate_bv & ~xstate->xs_rfbm) | xstate->xs_xstate_bv; Unfortunately, in this code you will likely hit a race; between the moment you set fpu_save->sv_xsave_hdr.xsh_xstate_bv and the moment you reach COPY_COMPONENT, the thread could have been context-switched, and xstate_bv could have been clobbered. I think you can work around this race by adding kpreempt_disable/enable at the beginning/end of the function. ISTR there is the same problem in the other fpregs ptrace functions. In fact, there are a lot of problems in our FPU code right now. Otherwise the rfbm approach seems good to me.
Re: ehci: fix error handling?
Le 25/05/2019 à 08:53, Iain Hibbert a écrit : On Sat, 25 May 2019, Maxime Villard wrote: To fix this problem, we need to fix the error handling in ehci_pci_attach(), so that goto fail does not leave 'sc' in an inconsistent state. What is the best way to do that? Should I somehow forcibly unregister the device, or should I add a flag in 'sc' to indicate whether it is initialized? the latter It looks like there is no clean way to handle attach failures. attach always 'succeeds' in the sense that after attach has been called, detach will always be called. The detach routine should tear down everything that needs tearing down and not do things that will fail. Perhaps the init could simply be done before the attach routine gets the chance to fail? iain Ok, thanks, I've fixed it and the machine shuts down correctly now. However there seem to be other problems in the attach/detach. ehci_init() and ehci_detach() are not symmetrical, the former initializes more stuff than the latter deinitializes. ehci_pci.c deinitializes the extra stuff correctly, but the others (cardbus, arm, mips) don't.
Re: [PATCH v3 2/2] Implement PT_GETXSTATE and PT_SETXSTATE
Le 09/06/2019 à 09:46, Michał Górny a écrit : On Sun, 2019-06-09 at 09:08 +0200, Maxime Villard wrote: Le 07/06/2019 à 20:33, Michał Górny a écrit : [...] +int +process_machdep_doxstate(struct lwp *curl, struct lwp *l, struct uio *uio) + /* curl: tracer */ + /* l:traced */ +{ + int error; + struct xstate r; + char *kv; + ssize_t kl; + + kl = MIN(uio->uio_iov->iov_len, sizeof(r)); + kv = (char *) &r; + + kv += uio->uio_offset; + kl -= uio->uio_offset; + if (kl > uio->uio_resid) + kl = uio->uio_resid; + + if (kl < 0) + error = EINVAL; + else + error = process_machdep_read_xstate(l, &r); + if (error == 0) + error = uiomove(kv, kl, uio); + if (error == 0 && uio->uio_rw == UIO_WRITE) { + if (l->l_proc->p_stat != SSTOP) + error = EBUSY; Isn't it supposed to always be the case? To be honest, I've followed suit with all other getter-setters there. I can't say if it's fail-safe that's never supposed to fire or if there could be some mistaken use possible that would trigger this. I don't see other places that do that. If it was needed there would be a race, because it doesn't seem to me we lock the proc, so what if the proc goes != SSTOP after the check. + /* Copy MXCSR if either SSE or AVX state is requested */ + if (xstate->xs_xstate_bv & (XCR0_SSE|XCR0_YMM_Hi128)) { + memcpy(&fpu_save->sv_xmm.fx_mxcsr, &xstate->xs_fxsave.fx_mxcsr, 8); + + /* +* Invalid bits in mxcsr or mxcsr_mask will cause faults. +*/ + fpu_save->sv_xmm.fx_mxcsr_mask &= x86_fpu_mxcsr_mask; + fpu_save->sv_xmm.fx_mxcsr &= fpu_save->sv_xmm.fx_mxcsr_mask; Please use a simple assignment instead of memcpy, and also filter out 'xstate' and not 'fpu_save'. Will do. Actually, simple assignment means I can filter it out while assigning. Also, it would be nice to clarify the use case here. On x86, mxcsr gets reloaded *regardless* of whether xstate_bv contains SSE|AVX. Should the ptrace api also reload mxcsr regardless of whether the user requested SSE|AVX in xstate_bv? I was following the Intel programmer's manual. XRSTOR uses mxcsr either if SSE or AVX is requested via EAX. Yes, but EAX is the RFBM, not XSTATE_BV. + } + + /* Copy SSE state if requested. */ + if (xstate->xs_xstate_bv & XCR0_SSE) { + if (x86_fpu_save >= FPU_SAVE_XSAVE) { + KASSERT(fpu_save->sv_xsave_hdr.xsh_xstate_bv & XCR0_SSE); Mmh, maybe it is possible that this KASSERT fires, if the LWP hasn't used SSE. I don't think that can actually happen with XSAVE. I wouldn't be sure with XSAVEOPT but we're not using it. In fact, what is the purpose? The exact purpose is to verify that a request to write unsupported component didn't get through process_verify_xstate(). It's an assert, so it means to check for something-we-really-don't-expect-to-happen. The KASSERT is on 'fpu_save', not 'xstate', so process_verify_xstate() doesn't play any role here. My point was: if the traced proc never used SSE, then hardware.xstate_bv[SSE]=0 (which means, the "init" state). And the KASSERT fires. A few other things: * In process_machdep_doxstate(), need to add a memset(&r, 0, sizeof(r)); * In process_write_xstate(), need to OR the xstate_bv in 'fpu_save' with the xstate_bv in 'xstate', otherwise the next XRSTOR of the traced proc may not install the registers correctly * Is there any urgency in adding support for XSTATE in ptrace? Because I am currently rewriting the FPU code entirely * I think it would be easier if you hosted your patches online, because the repeated threads make it hard to follow what's going on
Re: [PATCH v3 1/2] Fetch XSAVE area component offsets and sizes when initializing x86 CPU
Le 09/06/2019 à 09:33, Michał Górny a écrit : On Sun, 2019-06-09 at 09:20 +0200, Maxime Villard wrote: In fact, the whole loop seems wrong: CPUID uses the XCR0_ constants, not the XSAVE_ constants. Eg HDC is 13 in XCR0_ but 10 in XSAVE_, so you never iterate over it. It's intentional. I only iterate up to XSAVE_MAX_COMPONENT, i.e. the last component actually used by the kernel. I don't skip earlier unused components to avoid index madness but I see no purpose to go beyond last component actually used. Ok, that's fine for now and we can revisit that later for future states. However, maybe rename XSAVE_ -> XCRO_SHIFT_, or something else, to make it clear that this depends on the hardware layout.
Re: [PATCH v3 1/2] Fetch XSAVE area component offsets and sizes when initializing x86 CPU
Le 07/06/2019 à 20:33, Michał Górny a écrit : +#define XSAVE_MAX_COMPONENT XSAVE_Hi16_ZMM [...] +size_t x86_xsave_offsets[XSAVE_MAX_COMPONENT] __read_mostly; +size_t x86_xsave_sizes[XSAVE_MAX_COMPONENT] __read_mostly; Need +1, but see below + /* Get component offsets and sizes for the save area */ + for (i = XSAVE_YMM_Hi128; i < __arraycount(x86_xsave_offsets); i++) { + if (x86_xsave_features & ((uint64_t)1 << i)) { I would use __BIT(i). In fact, the whole loop seems wrong: CPUID uses the XCR0_ constants, not the XSAVE_ constants. Eg HDC is 13 in XCR0_ but 10 in XSAVE_, so you never iterate over it. I would drop XSAVE_* entirely and just use XCR0_*, it is forward compatible. -#define__NetBSD_Version__ 899004200 /* NetBSD 8.99.42 */ +#define__NetBSD_Version__ 899004300 /* NetBSD 8.99.43 */ No version bump needed for that
Re: [PATCH v3 2/2] Implement PT_GETXSTATE and PT_SETXSTATE
Le 07/06/2019 à 20:33, Michał Górny a écrit : [...] +int +process_machdep_doxstate(struct lwp *curl, struct lwp *l, struct uio *uio) + /* curl: tracer */ + /* l:traced */ +{ + int error; + struct xstate r; + char *kv; + ssize_t kl; + + kl = MIN(uio->uio_iov->iov_len, sizeof(r)); + kv = (char *) &r; + + kv += uio->uio_offset; + kl -= uio->uio_offset; + if (kl > uio->uio_resid) + kl = uio->uio_resid; + + if (kl < 0) + error = EINVAL; + else + error = process_machdep_read_xstate(l, &r); + if (error == 0) + error = uiomove(kv, kl, uio); + if (error == 0 && uio->uio_rw == UIO_WRITE) { + if (l->l_proc->p_stat != SSTOP) + error = EBUSY; Isn't it supposed to always be the case? + /* Copy MXCSR if either SSE or AVX state is requested */ + if (xstate->xs_xstate_bv & (XCR0_SSE|XCR0_YMM_Hi128)) { + memcpy(&fpu_save->sv_xmm.fx_mxcsr, &xstate->xs_fxsave.fx_mxcsr, 8); + + /* +* Invalid bits in mxcsr or mxcsr_mask will cause faults. +*/ + fpu_save->sv_xmm.fx_mxcsr_mask &= x86_fpu_mxcsr_mask; + fpu_save->sv_xmm.fx_mxcsr &= fpu_save->sv_xmm.fx_mxcsr_mask; Please use a simple assignment instead of memcpy, and also filter out 'xstate' and not 'fpu_save'. Also, it would be nice to clarify the use case here. On x86, mxcsr gets reloaded *regardless* of whether xstate_bv contains SSE|AVX. Should the ptrace api also reload mxcsr regardless of whether the user requested SSE|AVX in xstate_bv? + } + + /* Copy SSE state if requested. */ + if (xstate->xs_xstate_bv & XCR0_SSE) { + if (x86_fpu_save >= FPU_SAVE_XSAVE) { + KASSERT(fpu_save->sv_xsave_hdr.xsh_xstate_bv & XCR0_SSE); Mmh, maybe it is possible that this KASSERT fires, if the LWP hasn't used SSE. In fact, what is the purpose? Also general notes: * Please KNF * Don't add too many comments when the code is simple enough
nvmm: change the api
I intend to change the NVMM API in order to reduce the data movements. The patches are available here [1]. Basically until now the API expected the user to create structures and then pass them in a libnvmm function. Typically: /* The structures. */ struct nvmm_machine mach; struct nvmm_x64_state state; struct nvmm_event event; struct nvmm_exit exit; /* Create the machine and VCPU0. */ nvmm_machine_create(&mach); nvmm_vcpu_create(&mach, 0); /* Set RAX=123. */ nvmm_vcpu_getstate(&mach, 0, &state, NVMM_X64_STATE_GPRS); state.gprs[NVMM_X64_GPR_RAX] = 123; nvmm_vcpu_setstate(&mach, 0, &state, NVMM_X64_STATE_GPRS); /* Inject an #UD exception. */ event.type = NVMM_EXCEPTION; event.vector = 6; nvmm_vcpu_inject(&mach, 0, &event); /* Run. */ nvmm_vcpu_run(&mach, 0, &exit); switch (exit.reason) { ... } This design has some problems: the structures can occupy a lot of stack (nvmm_x64_state is > 1000 bytes), and must be copied in and out of the comm page by libnvmm. We can avoid this by giving the user direct access to the comm page. A new nvmm_vcpu structure is introduced, and contains pointers to the comm page. The new usage is: /* The structures. */ struct nvmm_machine mach; struct nvmm_vcpu vcpu; /* Create the machine and VCPU0. */ nvmm_machine_create(&mach); nvmm_vcpu_create(&mach, 0, &vcpu); /* Set RAX=123. getstate populates vcpu.state->gprs[]. */ nvmm_vcpu_getstate(&mach, &vcpu, NVMM_X64_STATE_GPRS); vcpu.state->gprs[NVMM_X64_GPR_RAX] = 123; nvmm_vcpu_setstate(&mach, &vcpu, NVMM_X64_STATE_GPRS); /* Inject an #UD exception. */ vcpu.event->type = NVMM_EXCEPTION; vcpu.event->vector = 6; nvmm_vcpu_inject(&mach, &vcpu); /* Run. */ nvmm_vcpu_run(&mach, &vcpu); switch (vcpu.exit->reason) { ... } No kernel changes are required. This should be part of NetBSD 9. [1] https://m00nbsd.net/garbage/nvmm/nvmm-nocopy.zip
ehci: fix error handling?
Recent Lenovo machines have a Realtek USB 1.1 controller with EHCI. NetBSD tries to attach it in dev/pci/ehci_pci.c::ehci_pci_attach(), and does goto fail with "pre-2.0 USB rev". The thing is, the device remains registered. When I try to shut down the machine, ehci_pci_detach() gets called and crashes, because we didn't completely initialize 'sc' in ehci_pci_attach(). In particular, it triggers a KASSERT in callout_halt(), because callout_init() was not called before. To fix this problem, we need to fix the error handling in ehci_pci_attach(), so that goto fail does not leave 'sc' in an inconsistent state. What is the best way to do that? Should I somehow forcibly unregister the device, or should I add a flag in 'sc' to indicate whether it is initialized? It looks like there is no clean way to handle attach failures. Currently it is impossible to cleanly shut down or reboot this machine.
Re: Removing PF
Le 02/04/2019 à 22:29, Mouse a écrit : I continue to use pf and not npf because : [...] However, I must say I'm still a bit confused by this answer (and the others I've seen). Do you understand that PF is a clear security risk for your system? Is it? Do you know MLH's systems enough to know whether any of the known vulnerabilities are relevant? I don't. That's a really simplistic answer; if by any chance he happens to avoid every vuln found until now then it's fine to use PF? What about the next one that comes in, will he still be lucky? No one will fix it, and it's not viable. Or, he could be using PF in an entirely offline network, in which case it is indeed fine (but that doesn't seem to be his particular case). If you really want to use PF, I would recommend that you switch to another OS, for your own safety. PF has no future in NetBSD. It doesn't? It seems to me, from the lack of consensus I'm seeing here, that that remains to be seen. We've been seeing it for eleven years, but we can wait another eleven years in case you still have a doubt about whether anyone wants to work on PF...
Re: Removing PF
Le 02/04/2019 à 20:46, MLH a écrit : I continue to use pf and not npf because : 1) I couldn't get std rulesets to seem to work (been a while though) 2) no port redirection 3) dynamic ruleset use didn't appear to be adequate 4) greylisting (not just email) for custom stuff that I can't see how to support in npf. 5) Needs far more documentation and help than I have seen. I would like to move to npf as some future features look nice (SYN floods, DoS attacks, etc). However, in addition to std rulesets, etc, I use log followers to block attacks. While not the main security, they really help hold down traffic, etc. and I'm not anywhere near willing to give them up. I tried using blacklistd but never could get it to work (also been a while). At least people answer to the question that was asked, so thanks for that already. However, I must say I'm still a bit confused by this answer (and the others I've seen). Do you understand that PF is a clear security risk for your system? Or, you obviously understand, but don't care much? Sure, PF has features NPF doesn't have; but a firewall is supposed to stop the fire, not create the conditions for it to spread. And sure, each software has bugs, but you don't need to have a nobel prize to understand that 11yo unmaintained software has much more bugs than its up-to-date version, in the case of PF it is obviously proven. In essence, if it's that you don't care, then indeed keeping PF may not be a real problem for us, except looking a bit irresponsible. I mean, we don't care either if you give your credit card number to every stranger that calls you on the phone... some responsibility is on the user's side. However, I do believe that our responsibility is still to prevent confusion, even when it implies removing some features. Yes, it is sad if you can't use ftp-proxy on NPF for now, yes NPF's syntax is not the same as PF's, and so on. But NPF equally has many advanced benefits, that you don't get with PF. If you really want to use PF, I would recommend that you switch to another OS, for your own safety. PF has no future in NetBSD. It's been one decade of this, at some point we need to cut the crap.
Re: Removing PF
Le 30/03/2019 à 19:22, Michael van Elst a écrit : On Sat, Mar 30, 2019 at 04:55:53PM +0100, Maxime Villard wrote: Of course, if it's really about rototillng things actively in other places and to limit the amount of work to fix the fallout, it suddenly becomes reasonable to remove something you personally don't care about. ... yes, that's part of the goal, removing dead wood makes So, whatever you are not care about, you call dead wood. ... I'm still waiting for you to provide a list of people who maintain PF in NetBSD ... still waiting ... I'm also waiting for you to provide actual answers rather than one-line quotes from my emails, taken out of context, and used to draw glaringly wrong conclusions (if any) ... but I guess no point waiting for that to happen ... We should remember that. ... sure, meanwhile you didn't really answer to the core of the issue, which I think was stated clearly by Sevan ...
Re: GSoC 2019 - Porting wine to amd64
Le 30/03/2019 à 20:14, zerous a écrit : On Sat, Mar 30, 2019 at 05:15:27PM +0100, Maxime Villard wrote: Kindly find the draft attached along with this mail. I would be glad to receive feedback on the same. This sounds really good. However, I have to warn you: this project is more a userland project than a kernel project, and the amount of kernel work will be limited. But if that's fine on your side, that's fine on ours. That is absolutely fine with me. I hope it will help me know where to look when I find myself lost in the source code and moreover, I will get the chance to interact with the developers who built it. So, I am really looking forward to doing this project. Great, in this case I believe the next step is that you submit your proposal on the GSoC website, and we'll pick it up from there.
Re: Removing PF
Le 30/03/2019 à 20:26, Michael van Elst a écrit : On Sat, Mar 30, 2019 at 08:10:21PM +0100, Maxime Villard wrote: ... sure, meanwhile you didn't really answer to the core of the issue, which I think was stated clearly by Sevan ... The issue is that we need to work on npf before we can drop other code. ... the questions raised were: why would someone use an insecure firewall? ... and isn't it irresponsible to provide an insecure firewall? ... you still fail to answer ... I see fewer and fewer reasons to keep talking to you, given your clear inability to answer in good faith ...
Re: Removing PF
Le 30/03/2019 à 16:24, Michael van Elst a écrit : On Sat, Mar 30, 2019 at 03:11:22PM +0100, Maxime Villard wrote: The best way forward to drop PF is to actively develop NPF. ... and the best way to actively develop NPF is to stop splitting effort on three firewalls So now you stop maintaining PF and IPF and redirect your efforts to NPF ? If not you, who does ? Somehow this doesn't fit into your narrative that there are no maintainers. ... yes, when PF is not there I don't have to worry about PF-related PRs that report huge vulns in the general indifference, and I don't need to waste time investigating, fixing, and documenting the vulns because I feel like I should care for the poor vulnerable people that use PF ... there indeed are no maintainers, just people like me who make (or rather, have made, and don't want to anymore) changes to prevent the code from collapsing completely, or who mechanically change APIs without ever testing for real, thereby possibly adding even more bugs and vulns than there already are ... Of course, if it's really about rototillng things actively in other places and to limit the amount of work to fix the fallout, it suddenly becomes reasonable to remove something you personally don't care about. ... yes, that's part of the goal, removing dead wood makes stuff easier to maintain and improve, and as discussed in other threads, things that have legitimate reasons to go, have legitimate reasons to go ... and stop directing users to the three of them ... It's a choice, users like that, in particular if that choice makes a difference. Someone like you should understand this. ... I've already said I understand the "missing features" aspect of things, but I've also already questioned whether it was a really good argument, because I don't understand how someone can possibly want a flawed firewall, which, in the case of PF, can open more security holes than it plugs ... so, why? ... Sevan has summed up pretty well the underlying question: isn't it irresponsible to ship an insecure firewall and give our users the impression that they can use it, or even the possibility to use it ... so, isn't it irresponsible?
Re: GSoC 2019 - Porting wine to amd64
Le 29/03/2019 à 12:13, zerous a écrit : Hi, I strongly feel like commiting to the project of porting Wine to amd64 on NetBSD. My reasons for this liking include my experience with BSD/UNIX, C and shell in general. I have always wanted to do systems engineering which deals with the kernel and other subsystems. I have 4 years of experience in system administration and programming in C. I decided to take up this gig primarily because it deals with emulation and as a KERNEL option is involved I might as well use this opportunity to learn more about kernel modules. I have written a draft-proposal for the project after consulting with some of the netbsd developers like @medfly, @martin and @leot. Kindly find the draft attached along with this mail. I would be glad to receive feedback on the same. This sounds really good. However, I have to warn you: this project is more a userland project than a kernel project, and the amount of kernel work will be limited. But if that's fine on your side, that's fine on ours.
Re: Removing PF
Le 30/03/2019 à 14:34, Michael van Elst a écrit : On Sat, Mar 30, 2019 at 02:18:29PM +0100, Piotr Meyer wrote: On Sat, Mar 30, 2019 at 12:59:00PM -, Michael van Elst wrote: Just to get the facts straight: NPF has a bigger market share *outside* NetBSD, at least certainly for commercial users. They also contribute. Ironically, the same is also true for PF... But contributions to PF are made against OpenBSD kernel, true? And the essence of problem is that: - NetBSD version of PF is too old to adapt them without ton of work, That's self-evident, the "ton of work" is of course just the same that would have been needed if you had tracked pf development. ... which doesn't withdraw anything from the fact that no one has been willing to do this work over the last 10 years, and likely not in the next 10 years either ... - even current PF in OpenBSD doesn't fit very well into MP kernels Whatever "MP kernel" is. It no longer fits that well to our recent MP changes of the network stack, but that mostly results in being less efficient, not in being more difficult to port or track. ... reports from FreeBSD on the matter indicate that it is more difficult to port, unsurprisingly, not to mention that having PF will then be an obstacle to enabling full MP-safety in GENERIC ... so no, it's not just less efficient, it becomes a big obstacle to full MP-safety in GENERIC ... and that makes "fresh" import cumbersome and fruitless It's always cumbersome to import foreign code. NPF won't be any different, but since it is a newer development, it has some headstart. Don't believe NPF is something "native" and thus easier to track, see the recent change from our "native" proplib to its proprietary library. ... NPF is NetBSD's native firewall and is well integrated in the NetBSD kernel, the recent switch from proplib to libnv is very anecdotical, if anything ... Le 30/03/2019 à 14:47, Michael van Elst a écrit : That's a myth. When I switched from IPF to NPF, the first thing required was to fix NPF and NPF-related bugs. The NetBSD version of it was pretty much unmaintained, it still is to a lesser degree. ... unmaintained as in "it was 2 months behind the Github version", not as in "it was 11 years behind the OpenBSD version" ... the NPF development is jumping between NetBSD and Github, with changes being regularly merged in both directions by Mindaugas from time to time ... The best way forward to drop PF is to actively develop NPF. ... and the best way to actively develop NPF is to stop splitting effort on three firewalls and stop directing users to the three of them ... it's also to stop this illusional/irrational approach by which it is fine to use inflammable firewalls ... yes, each software has bugs, but maintained software benefits from improvements, outdated software doesn't ... outdated software must be kept synchronized, which brings us back to the first point, no one wants to work on that ...
Re: Removing PF
Le 30/03/2019 à 11:06, Manuel Bouyer a écrit : On Sat, Mar 30, 2019 at 06:07:01AM +0100, Maxime Villard wrote: [...] If the effort hadn't been split on three firewalls in the last 10 years, but rather had been focused on only one, don't you think NPF would now be featureful? Exactly. If work had been done to improve ipf instead of reinventing the wheel, ipf would probably be in much better shaoe now. Well, you're not wrong, but there were also many good reasons for developing a firewall from scratch, and NPF is now used in commercial products, while IPF is being dropped by commercial products. NPF is not a failure. I'm still using ipf because npf miss the features I need (essentialy groups). Each time I mentioned this, no comments on this topic were made. This is one of the reasons why we keep IPF for now, to wait for NPF to reach feature-parity. This point hasn't been ignored. But this thread is about PF, not IPF, and was prompted by the two recent PF vulns that were not fixed in NetBSD by anyone. I'm talking about the current state of affairs, and how to move forward. The current state of affairs is that PF is in a deplorable state compared to NPF, which "only" lacks a few features. The argument made about "ftp-proxy", while understandable, does remain highly questionable. PF hasn't received any security updates in NetBSD, so do people really wish to still use it for ftp-proxy? It seems silly to use an outdated firewall, because it creates more security holes than it plugs. Firewalls are not supposed to be inflammable. Please do not confuse, this thread is about PF, not IPF. I believe we can easily resolve the ftp-proxy support issue on NPF.
Re: Removing PF
Le 30/03/2019 à 10:06, BERTRAND Joël a écrit : Michael van Elst a écrit : joel.bertr...@systella.fr (=?UTF-8?Q?BERTRAND_Jo=c3=abl?=) writes: Just a question. I use NPF for a while on NetBSD-8 that provides router functionalities (over a lot of network interfaces with agr, bridge, vpn and some others...). NPF runs fine but I'm unable to configure ftp-proxy. Last time I have checked, ftp proxy doesn't work and I have tried to fix ftp-proxy (src/dist/pf/usr.sbin/ftp-proxy) without any success. A lot of npf code is missing or broken. NPF so far doesn't implement a ftp proxy. The ftp-proxy you found is for PF. I know. But without a function like ftp-proxy, npf cannot replace fp. ftp-proxy seems to be a very simple code. Before dropping fp, I think ftp-proxy has to run with npf. JB Fair enough, this _is_ an actual reason for not removing PF right now. I'll look at how the ftp-proxy stuff can work on NPF.
Re: Removing PF
Le 30/03/2019 à 08:51, Michael van Elst a écrit : On Sat, Mar 30, 2019 at 08:17:44AM +0100, Maxime Villard wrote: Le 30/03/2019 à 08:07, Michael van Elst a écrit : m...@m00nbsd.net (Maxime Villard) writes: If the effort hadn't been split on three firewalls in the last 10 years, but rather had been focused on only one, don't you think NPF would now be featureful? Now if the effort of the world hadn't been split among Windows, MacOS and Linux, but rather had been focused on only NetBSD, don't you think NetBSD would now be featureful ? Yes, if the effort had been on one firewall instead of three, the one chosen would be more functional. Period. You have absolutely no point, do you? You just don't understand the point. Period. Hey, I said Period, it must be true. Do you believe people would give up their efforts on Windows, or MacOS or Linux to focus on NetBSD ? Ah, silly me, too abstract. Do you really believe the maintainers of ipf and npf would give up their projects to work on pf ? Ah because there are maintainers of ipf? There is _no_ maintainer of ipf whatsoever. Do you understand, or not? Give me a list of people who maintain ipf. Just like, give me a list of people who maintain pf in NetBSD. The point is that you only alienate people that are not following your decision. Period. Hah, again. Decision? I haven't taken any decision, I've made a clear and logical proposal to drop PF, that I am sending to a public mailing list for further discussion. You, on the other side, have put forward zero clear argument, and have just thrown random, inaccurate and irrelevant comparisons, that leave my proposal completely unchallenged.
Re: Removing PF
Le 30/03/2019 à 10:40, Maxime Villard a écrit : Le 30/03/2019 à 10:06, BERTRAND Joël a écrit : Michael van Elst a écrit : joel.bertr...@systella.fr (=?UTF-8?Q?BERTRAND_Jo=c3=abl?=) writes: Just a question. I use NPF for a while on NetBSD-8 that provides router functionalities (over a lot of network interfaces with agr, bridge, vpn and some others...). NPF runs fine but I'm unable to configure ftp-proxy. Last time I have checked, ftp proxy doesn't work and I have tried to fix ftp-proxy (src/dist/pf/usr.sbin/ftp-proxy) without any success. A lot of npf code is missing or broken. NPF so far doesn't implement a ftp proxy. The ftp-proxy you found is for PF. I know. But without a function like ftp-proxy, npf cannot replace fp. ftp-proxy seems to be a very simple code. Before dropping fp, I think ftp-proxy has to run with npf. JB Fair enough, this _is_ an actual reason for not removing PF right now. I'll look at how the ftp-proxy stuff can work on NPF. Hum, wait a minute in here. Am I being totally dumb, or NPF does support ftp-proxy since 2011? See src/dist/pf/usr.sbin/ftp-proxy/npf.c, and the associated commit message.
Re: Removing PF
Le 30/03/2019 à 08:07, Michael van Elst a écrit : m...@m00nbsd.net (Maxime Villard) writes: If the effort hadn't been split on three firewalls in the last 10 years, but rather had been focused on only one, don't you think NPF would now be featureful? Now if the effort of the world hadn't been split among Windows, MacOS and Linux, but rather had been focused on only NetBSD, don't you think NetBSD would now be featureful ? Yes, if the effort had been on one firewall instead of three, the one chosen would be more functional. Period. You have absolutely no point, do you?
Re: Removing PF
Le 29/03/2019 à 22:10, Brian Buhrow a écrit : ... And, prey tell, why is it harder to maintain a modern version of PF than NPF? ... It is about the _fourth_ time you ask this question, and for the fourth time I am going to provide the exact same answer. PF has a legacy design that makes it difficult to import in MP-safe kernels like NetBSD or FreeBSD. NPF is already MP-safe and well integrated in the NetBSD kernel, and that's why it is a lot easier to work on. Is it clear now, or do you want to ask a fifth time? Again, if it was as easy you as think it is, we woud likely have maintained it. The reality is that no one is willing to maintain PF. At the risk of being extremely rude, although that is not my intention, if as much effort had been put into improving the functionality of NPF as has been put into the effort to get pf removed, I dare say, NPF would now be up to PF in functionality and the conversation would be easier to have. If the effort hadn't been split on three firewalls in the last 10 years, but rather had been focused on only one, don't you think NPF would now be featureful? The fragmented effort that has gone into fixing here and there crazy bugs in PF and IPF, just because random people (like you) wanted their outdated firewall not to collapse, is precisely what has brought us to the current state, that is, three half-baked firewalls, none of which is 100% functional or featureful. I'm even talking from experience here; last year I was looking at network- related PRs, and I found a PR from 2010 where a guy had reported that a single TCP packet could crash the kernel under PF. It turned out that our version of PF had a signedness bug that could cause a length check to pass, which caused a fatal memory corruption. Checking the guy's github, he even had written in 2010 an exploit for it. So for 8 years, the vuln had been sitting in our Gnats with a public exploit, and strictly no one giving a fuck about it. I proceeded to write a reproducer, fix the vuln, then pull it up, then write a security advisory for it. All of that is an awful waste of time and energy, for everybody. Perpetuating this crazy status quo is not going to make the situation any better, it will just make us continue splitting efforts. If we already manage to go from three firewalls to only two, it's a good progress, and again, it reduces the maintenance burden on the kernel. In the case of PF, I've already said it in the internal discussion (but not on tech-kern), if we wanted to import a new version, it would likely be better to remove the one we have, and make a clean import from scratch. In any case, this proposal (about removing PF) is not illegitimate or stupid, and it makes sense, given the awful state of PF. If you want to use outdated firewalls that haven't received security fixes, you can as well follow the NetBSD-6 branch, or older. And, once done, all those features would have to be maintained going forward, No, NPF is homegrown, each feature added doesn't need to be regularly maintained, contrary to PF. not to mention that there would be absolutely no testing outside of the NetBSD environment with NPF. As a matter of fact this is incorrect, NPF works on Linux too, there have been tests/development made on CentOS. I also have a vague memory of NPF being embedded in DPDK. So no, NPF is definitely not NetBSD-specific, and has received testing outside.
Removing PF
There have been internal discussions about removing PF from NetBSD. Currently, NetBSD's PF is 11 years old, has received no maintenance, and has accumulated bugs and vulnerabilities that were fixed upstream but not in NetBSD. The latest examples are two vulnerabilities recently discovered in PF, that haven't been fixed in NetBSD's PF by lack of interest. Importing recent versions of PF in scalable/performant kernels is a huge work because of PF's legacy design, and there have been reports that FreeBSD is also considering dropping PF. Just like other kind of dead wood, NetBSD's PF consumes APIs, makes stuff harder to change, and has now reached a point where it is lagging behind upstream way too much to still be considered a functional or secure firewall on NetBSD. NetBSD provides NPF, a clean, secure and scalable firewall, enabled by default, that can be used instead, even if it doesn't have all the features PF has for now. It is to be noted that IPF too is present in NetBSD, although its use is not recommended (for other reasons). Given NPF's advanced design and good integration in the NetBSD kernel, trying to maintain PF seems like a huge effort for little benefit, and the resources would be better spent on NPF. Even if we overcame the effort needed to import a new version of PF, we would still have to maintain it and regularly synchronize against upstream. Overall, it is not viable to keep PF, and has already proven not to be in the past, given the state its code finds itself in today.
Re: Regarding the ULTRIX and OSF1 compats
Le 24/03/2019 à 00:41, Thor Lancelot Simon a écrit : On Sun, Mar 17, 2019 at 05:41:18AM +0800, Paul Goyette wrote: On Sat, 16 Mar 2019, Maxime Villard wrote: Regarding COMPAT_OSF1: I'm not totally sure, but it seems that Alpha's COMPAT_LINUX uses COMPAT_OSF1 as dependency (even if there is no proper dependency in the module), because there are osf1_* calls. Some more compat mess to untangle, it seems... In all cases, it's only a few functions that are just wrappers, so it probably shouldn't be too complicated to solve. It's a total of 15 functions (I generated this list by building an alpha GENERIC kernel with COMPAT_OSF1 removed): osf1_sys_wait4 osf1_sys_mount [...] There's a funny historical reason for this -- Linux on Alpha originally used OSF/1 syscall numbering and args for its own native syscalls. No, our implementation need not actually reflect that -- but it's an interesting historical detail perhaps we should write down somewhere. Does that mean we don't need these syscalls?
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 22:41, Paul Goyette a écrit : On Sat, 16 Mar 2019, Maxime Villard wrote: Regarding COMPAT_OSF1: I'm not totally sure, but it seems that Alpha's COMPAT_LINUX uses COMPAT_OSF1 as dependency (even if there is no proper dependency in the module), because there are osf1_* calls. Some more compat mess to untangle, it seems... In all cases, it's only a few functions that are just wrappers, so it probably shouldn't be too complicated to solve. It's a total of 15 functions (I generated this list by building an alpha GENERIC kernel with COMPAT_OSF1 removed): osf1_sys_wait4 osf1_sys_mount osf1_sys_set_program_attributes osf1_sys_setitimer osf1_sys_select osf1_sys_gettimeofday osf1_sys_getrusage osf1_sys_settimeofday osf1_sys_utimes osf1_sys_statfs osf1_sys_fstatfs osf1_sys_sysinfo osf1_sys_usleep_thread osf1_sys_getsysinfo osf1_sys_setsysinfo All of these are references from linux_sysent.o Here is a patch [1] that untangles the dependency. Cross-compile-tested on alpha from amd64. [1] https://m00nbsd.net/garbage/compat/alpha.diff
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 22:41, Paul Goyette a écrit : On Sat, 16 Mar 2019, Maxime Villard wrote: Regarding COMPAT_OSF1: I'm not totally sure, but it seems that Alpha's COMPAT_LINUX uses COMPAT_OSF1 as dependency (even if there is no proper dependency in the module), because there are osf1_* calls. Some more compat mess to untangle, it seems... In all cases, it's only a few functions that are just wrappers, so it probably shouldn't be too complicated to solve. It's a total of 15 functions (I generated this list by building an alpha GENERIC kernel with COMPAT_OSF1 removed): osf1_sys_wait4 osf1_sys_mount osf1_sys_set_program_attributes osf1_sys_setitimer osf1_sys_select osf1_sys_gettimeofday osf1_sys_getrusage osf1_sys_settimeofday osf1_sys_utimes osf1_sys_statfs osf1_sys_fstatfs osf1_sys_sysinfo osf1_sys_usleep_thread osf1_sys_getsysinfo osf1_sys_setsysinfo All of these are references from linux_sysent.o Yes. Most of these functions are basic wrappers, that I think we can just gather into a linux_misc.c or similar. Do we know if COMPAT_OSF1 has actually ever been supported on HPPA? It is commented out in GENERIC since rev1, and there is some Alpha-specific code in osf1_misc.c that could hardly have compiled on HPPA.
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 20:49, m...@netbsd.org a écrit : It makes me compelled to delete more of it. COMPAT_LINUX doesn't work without matching PAGE_SIZE. What does that mean?
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 17:44, Dave McGuire a écrit : On 3/16/19 11:04 AM, Maxime Villard wrote: I think that what Robert, and others (including me) argument is actually that things should not be removed, and the reason would be that this is the core mission, purpose, reason (or whatever you want to call it) for NetBSDs existence. Instead it should be fixed, because that is what it all is about. Make it work - don't remove it. It seems our disagreement comes down to that, indeed. It seems to me this has already been answered to, too, in previous discussions. Sigh. So I see you're at it again. You are obviously quite obsessed with chopping out functionality that you have unilaterally decided is not worth having in NetBSD anymore. Could you possibly be more dishonest than this? Impressive. Keep up the good work. Just who the hell do you think you are, anyway, and why are you so obsessed with trashing everything that existed before you graced us with your presence? I am one of the very few people that have continuously and extensively worked on the NetBSD kernel over the last five years, to bring stability, performance and security improvements in all of the system. I've redesigned a good part of our x86 port, I've developed all the complex fixes for the speculation bugs, I've worked on securing our network stack, our syscalls, our allocators, I've developed comprehensive hypervisor support, the strongest KASLR existing to date, and apart from that, I've developed numerous bug detection systems that have fixed several hundreds of bugs in NetBSD, among many other miscellaneous things I've done in and for NetBSD. And you, who are you exactly? You sound like a clueless fatass. If you have such a problem with the COMPAT_* layers, go fix them. I realize that would be more work, and it'd be easier to just trash them, but that's not what NetBSD is about, and your attitude about this is most unwelcome, coming from this NetBSD user since v0.9. You need a serious adjustment of perspective. As others have suggested, please go read about what NetBSD is, its clearly-stated mission, and its spirit. As I said, keep up the good work. Anyway you've just lost what was left of your credibility, your mails go directly to the trash from now on.
Re: pool: removing ioff?
Le 16/03/2019 à 18:00, Greg Troxel a écrit : Maxime Villard writes: I would like to remove the 'ioff' argument from pool_init() and friends, documented as 'align_offset' in the man page. This parameter allows the caller to specify that the alignment given in 'align' is to be applied at the offset 'ioff' within the buffer. I think we're better-off with hard-aligned structures, ie with __aligned(32) in the case of XSCALE. Then we just pass align=32 in the pool, and that's it. I would prefer to avoid any confusion in the pool initializers and drop ioff, rather than having this kind of marginal and not-well-defined features that add complexity with no real good reason. Note also that, as far as I can tell, our policy in the kernel has always been to hard-align the structures, and then pass the same alignment in the allocators. I am not objecting as I can't make a performance/complexity argument. But, I wonder if this comes from the Solaris allocation design, and that the ioff notion is not about alignment for 4/8 objects to fit the way the CPU wants, but for say 128 byte objects to be lined up on various different offsets in different pages to make caching work better. But perhaps that doesn't exist in NetBSD, or is done differently, or my memory of the paper is off. Indeed, Sun's SLAB had cache coloring. We do too in our pool subsystem, that's not related to ioff, we don't lose it as a result of removing ioff.
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 17:00, Jason Thorpe a écrit : On Mar 16, 2019, at 5:09 AM, m...@netbsd.org wrote: Most likely, COMPAT_ULTRIX and COMPAT_OSF1 have the same type of bugs that we have seen in compatibility layers elsewhere. Is it worth his time to test them? Folks, PLEASE. This is a point I've tried to make repeatedly... These need to be taken on a case-by-case basis. [...] Yes... -_- Anyway, people, let's wrap it up. The status right now is that I've disabled COMPAT_OSF1 on Alpha, and I'm not sure if there is an improper dependency of COMPAT_LINUX on COMPAT_OSF1, because I see calls like osf1_sys_wait4 in sys/compat/linux/arch/alpha/, but there is no dependency in the modules. Right now I'm waiting to see if the build cluster fails, to know for sure. If there is indeed a dependency, we'll have to untangle it before retiring COMPAT_OSF1. I don't think it is really complicated, the functions used are just wrappers, that we can easily move into Alpha's COMPAT_LINUX.
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 16:12, Robert Elz a écrit : Date:Sat, 16 Mar 2019 14:28:27 +0100 From:Maxime Villard Message-ID: <17ba7752-793d-d352-09ef-c43676d2f...@m00nbsd.net> | Ok. So you believe that dead wood should hold back all of the development | process? No, but only if it truly is dead. Just because you don't see the utility does not make it so. | I believe you misread, You're right, I did, I just skimmed the patch you made, and did not look carefully enough. But the point is unchanged. There was a bug. When it was reported it was fixed. That's what's supposed to happen, and isn't evidence of anything being wrong. | As I said, this is one example among others. Yes, so you keep saying. But those others are in one of three groups. Either they're like this one, and have now been fixed, in which case they're irrelevant. Or they have been reported, and no-one is fixing them (which can happen for many reasons) in which case you should just supply the PR numbers for all of these problems, that might stir some activity around some of them, or it might not. Or you know of bugs that no-one else knows about, and you're just sitting on. That would be all on you. | And yes, this bug was introduced because the API was being replaced, | and yes, the person who committed this code likely did all he could | to make sure no bug was introduced, and yes, he did still add a bug | because a certain number of these compat layers are dead wood that | can't be tested, not even by those who claim they are so important. But this one (the one above) is in code that is apparently not dead wood, and that people do use. But that people use code does not mean that every bug is going to be encountered - bugs often stay undetected for a very long time, just because no-one happens to do whatever it takes to trigger them. | In fact, all of your "rules" were respected, yet the bug still inevitably | came in. Why is that? Because bugs happen.ALways have, always will. The only way to avoid bugs in NetBSD would be to shut it down. Delete everything. Then it would have no remaining bugs. It wouldn't be useful though. But it would end this discussion! | This is what happens with dead wood, you just don't want to face it. But, apparently, COMPAT_ULTRIX isn't dead woood, and people still use it. And that's what the one bug (now fixed bug) you have cited, that I have seen, was in. | And I'm not even speaking for myself, you just have to see the answers to | "does anyone use compat_xxx". You can even try as hard as I did, with | COMPAT_SVR4, ask on several different mailing lists, repeatedly, over and | over. No one ever answers. And most likely no-one ever will. You cannot get answers that way, the people on our mailing lists mostly only care about this week's (or this year's anyway) code, and wouldn't consider running anything that didn't come from available sources. But NetBSD (and all the other systems) are used by more people that that. I know of several NetBSD users who would never go anywhere near any of our mailing lists ... the closest they ever come is looking at the web site from time to time to see if there's been a new release. The only way to find out whether something is being used is to remove it from the default config, but leave it relatively easy to turn back on again (and ideally ask - where it is to be enabled - for anyone who needs it, and enables it, to send e-mail to x...@netbsd.org) That was done with the removal of the ISA graphics card support (I mean, who uses ancient PCs, new ones are cheap, and much better...) but then it turned out that someone still did. | If UVM was of questionable utility, Everything is of questionable utility. We used to not have UVM so we know it is possible to survive without it. Note I am not questioning keeping that, just your certainty that you know everything that is required, and useful, and that everything else is "dead wood" and should be deleted. | I stated my point clearly and logically, about why certain things have | legitimate reasons to go away, Sorry, I must have missed that. All I ever seem to see is that xxx is unmaintained and full of unspecified bugs, and that obviously no-one cares, and so we should delete it.That's not an argument for anything. Also note I am not arguing for keeping anything in particular, just against your reasons for deleting it. If some functionality really is of no further use (and some of the drivers that were recently removed seem to be in that category) then by all means, we should remove it. But that you don't use it, and no-one on the mailing lists admits to using it, does not mean no-one uses it. You need to move much more slowly. I'd suggest perhap
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 14:43, Johnny Billquist a écrit : On 2019-03-16 14:28, Maxime Villard wrote: I stated my point clearly and logically, about why certain things have legitimate reasons to go away, regardless of whether they are compat layers, or drivers, or something else. Rather than giving clear, logical counter arguments, you are just repeating "XXX is YYY years old, remove it". You are going to have to provide better arguments I'm afraid, because you're not going to convince _*anyone*_ with that. I think that what Robert, and others (including me) argument is actually that things should not be removed, and the reason would be that this is the core mission, purpose, reason (or whatever you want to call it) for NetBSDs existence. Instead it should be fixed, because that is what it all is about. Make it work - don't remove it. It seems our disagreement comes down to that, indeed. It seems to me this has already been answered to, too, in previous discussions. "Make it work" -> No one has volunteered to make it work. No one is maintaining it, no one is actively using it, no one is testing it. It's simple, no one is here to "make it work". "Don't remove it" -> If you have a suggestion on what to do, feel free to tell me about it. As was said numerous times, dead wood consumes APIs and makes stuff harder to change; the dead wood naturally deprecates even more because we all make blind changes. It then reaches a state where it's broken beyond repair, and there's just nothing else to do than retiring it. It may not be the case for OSF1 right now, but it was certainly the case for SVR4, and it will inevitably be the case for the few esoteric compat layers we still have. It's good to think what you think, but you need to face reality. The reality is that no one wants to take care of our compat layers (compat_linux being an exception, maybe). No one has taken care of that in the last 10 years either. Compat layers may have been a USP 20+ years ago, but a good number of them are in a deplorable state nowadays, in addition to being of a questionable utility, to say the least. And don't even try to say "uh but this is YOUR reality and not mine", it would just be ridiculous, these things have been discussed many times, and it's no rocket science.
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 13:53, Robert Elz a écrit : Whoever is changing them should be fixing all the users of those APIs, including the ones in the compat code. Consider all the work PaulG did as part of the kenel module changes recently -- that's an example of how it should be done. Simply deciding an API should alter, and changing the few places that you care about is not good enough. (generic "you" not personal there). [... and the rest ...] Ok. So you believe that dead wood should hold back all of the development process? See below: | I've already given this example (and it is one among many others): |https://mail-index.netbsd.org/source-changes/2017/12/03/msg090179.html That's not an example of anything useful. There was a bug. It was (apparently) reported. It was fixed (you fixed it!) That it hadn't been reported for a long time merely meant that no-one had noticed it. In that case, that's not surprising, the bug was when copyout() failed, which in practice is a very rare event - as it normally requires bugs (and often really dumb bugs) in the application as well. I believe you misread, 'iter' was not initialized, there is a goto earlier, this could have been an exploitable vulnerability. As I said, this is one example among others. And yes, this bug was introduced because the API was being replaced, and yes, the person who committed this code likely did all he could to make sure no bug was introduced, and yes, he did still add a bug because a certain number of these compat layers are dead wood that can't be tested, not even by those who claim they are so important. In fact, all of your "rules" were respected, yet the bug still inevitably came in. Why is that? This is what happens with dead wood, you just don't want to face it. The reality is that very, very, very few (if any) people care about our esoteric compat layers; no one is testing them, no one is actively using them, no one even ever reads their code. And I'm not even speaking for myself, you just have to see the answers to "does anyone use compat_xxx". You can even try as hard as I did, with COMPAT_SVR4, ask on several different mailing lists, repeatedly, over and over. No one ever answers. A similar example would be the mlock(addr, 0) bug that was fixed the other day. That had also been there for ages (perhaps since UVM was added). Should the existence of that bug for such a long time be a justification for removing UVM ? If UVM was of questionable utility, yes, absolutely. Except it isn't. The compat layers we've talked about (SVR4, IBCS2, now OSF1) _are_ of questionable utility, as discussed. I stated my point clearly and logically, about why certain things have legitimate reasons to go away, regardless of whether they are compat layers, or drivers, or something else. Rather than giving clear, logical counter arguments, you are just repeating "XXX is YYY years old, remove it". You are going to have to provide better arguments I'm afraid, because you're not going to convince _*anyone*_ with that.
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 12:05, Johnny Billquist a écrit : I don't see any confusion. My question was about what NetBSD is about. If we are serious about being interoperable, implementing many standard APIs, and care about different platforms, then buggy or broken code should be fixed. Or at worst left for someone else to fix later. If we don't care about the interoperability, then indeed, we delete the code. This misses absolutely 100% of the points that were given in the many discussions that have been had on this topic. Please re-read the threads. In short, there are things that are a burden in terms of maintenance, and that no one is willing to fix, because no one is interested in doing so. These things have legitimate reasons to go away. Similarly there are things that are hindrances to overall design improvement, like MPification; and when such things have questionable utility and receive zero maintenance, they also have legitimate reasons to go away. This has been said very clearly multiple times. If that's your definition of interoperable, then indeed, we used to be - and still are a bit, actually - a very, very interoperable system. Do you mean that deleting the code is your definition of interoperable? You are confused.
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 10:12, Johnny Billquist a écrit : On 2019-03-16 09:45, Maxime Villard wrote: Le 16/03/2019 à 06:23, John Nemeth a écrit : On Mar 15, 10:31pm, Michael Kronsteiner wrote: } } i have this discussion today aswell... considering 64/32bit machines. } if you want ultrix, install ultrix. if you want osf1/dec unix/tru64 } install that. being able to run ummm nearly 20 year old binaries... } well. if thats what you want be prepared for a ride. i never ran } "foreign" binaries on a BSD. and i often compile myself even on more } "user friendly" systems. By any chance, have you seen our About page: http://www.netbsd.org/about/ ? The second paragraph reads thus: - One of the primary focuses of the NetBSD project has been to make the base OS highly portable. This has resulted in NetBSD being ported to a large number of hardware platforms. NetBSD is also interoperable, implementing many standard APIs and network protocols, and emulating many other systems' ABIs. - Emulating other systems is fundamental to what NetBSD is about. This is a really simplistic answer. It is not difficult to see that our website does not reflect reality at all. So, what is the reality then, in your opinion? The reality is that our compat layers are largely unmaintained and broken. One would have to be very dumb to believe that this constitutes a USP.
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 12:17, Robert Elz a écrit : If there are bug reports that are not being attended to (open PRs), that's different. Otherwise unmaintained code is simply code that doesn't need changes (which for emulation of ancient systems is not a huge surprise - those systems aren't changing any more, if the emulation is adequate for its purpose - which does not mean it has to be complete - then there is no reason for it to ever change again.) I understand this point. But to me it is deeply wrong: the compat layers use system APIs, and these APIs do change regularly. You can't change the VFS locking without changing the implementations of each compat layer; yet indeed the emulated systems aren't changing anymore. I've already given this example (and it is one among many others): https://mail-index.netbsd.org/source-changes/2017/12/03/msg090179.html From my point of view (of someone who did want to change the mbuf API for instance, and who realized it was just impossible given the ton of dead wood that uses it), in these cases, we are forced to take the time to make blind changes which may be wrong. It costs us the effort, and makes us add bugs. If the situation was as you described, I wouldn't mind. But to me, the old unmaintained code is a significant barrier to many improvements; so when the code in question has a questionable utility or is even known to be broken, there need to be talks about removing it. Hence my emails. Also, this thread seems to have diverted towards "interoperability", but I don't see what is the debate here. Dead wood is dead wood, regardless of whether it is a compat layer, or a driver, or a network protocol, or even a complete architecture (as has been done in the past already). We can disagree on the deadness of it, but I don't see the debate about interoperability. And as I said, I believe it is futile to talk about interoperability in the case of broken features (eg SVR4); the thing was completely broken, so it didn't make NetBSD more interoperable. Maybe the only thing it did was making NetBSD more buggy.
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 11:26, Johnny Billquist a écrit : If the answer is that we remove the code, then indeed, the whole webpage is incorrect, and we should change it to state that we do not try to be interoperable, implementing many standard APIs, or care about other platforms. There seems to be some confusion here. Being "interoperable" doesn't mean "keeping unmaintained code that is so broken it can't even emulate a dumbass mmap". (Thinking about SVR4 here.) If that's your definition of interoperable, then indeed, we used to be - and still are a bit, actually - a very, very interoperable system. Regarding the web site, if my memory is correct, not too long ago we did change the page to put less emphasis on the compat layers, but the changes were backed out because one person wanted to keep the old version; if my memory is correct, this person was John Nemeth.
Re: Regarding the ULTRIX and OSF1 compats
Le 16/03/2019 à 06:23, John Nemeth a écrit : On Mar 15, 10:31pm, Michael Kronsteiner wrote: } } i have this discussion today aswell... considering 64/32bit machines. } if you want ultrix, install ultrix. if you want osf1/dec unix/tru64 } install that. being able to run ummm nearly 20 year old binaries... } well. if thats what you want be prepared for a ride. i never ran } "foreign" binaries on a BSD. and i often compile myself even on more } "user friendly" systems. By any chance, have you seen our About page: http://www.netbsd.org/about/ ? The second paragraph reads thus: - One of the primary focuses of the NetBSD project has been to make the base OS highly portable. This has resulted in NetBSD being ported to a large number of hardware platforms. NetBSD is also interoperable, implementing many standard APIs and network protocols, and emulating many other systems' ABIs. - Emulating other systems is fundamental to what NetBSD is about. This is a really simplistic answer. It is not difficult to see that our website does not reflect reality at all. Regarding COMPAT_OSF1: I'm not totally sure, but it seems that Alpha's COMPAT_LINUX uses COMPAT_OSF1 as dependency (even if there is no proper dependency in the module), because there are osf1_* calls. Some more compat mess to untangle, it seems... In all cases, it's only a few functions that are just wrappers, so it probably shouldn't be too complicated to solve.
pool: removing ioff?
I would like to remove the 'ioff' argument from pool_init() and friends, documented as 'align_offset' in the man page. This parameter allows the caller to specify that the alignment given in 'align' is to be applied at the offset 'ioff' within the buffer. The two users I could find are: * sys/arch/arm/xscale/iopaau.c: iopaau_desc_4_cache and iopaau_desc_8_cache. * sys/arch/arm/arm32/pmap.c: pmap_l2ptp_cache, at a time, and it was actually a bug, the arguments were inverted... I think we're better-off with hard-aligned structures, ie with __aligned(32) in the case of XSCALE. Then we just pass align=32 in the pool, and that's it. I would prefer to avoid any confusion in the pool initializers and drop ioff, rather than having this kind of marginal and not-well-defined features that add complexity with no real good reason. Note also that, as far as I can tell, our policy in the kernel has always been to hard-align the structures, and then pass the same alignment in the allocators.
Re: Regarding the ULTRIX and OSF1 compats
Le 10/03/2019 à 17:39, Maxime Villard a écrit : There's been more emphasis on the fact that COMPAT_OSF1 should go. Do people have anything to say about that? So, no one? I will remove it soon...
Re: Regarding the ULTRIX and OSF1 compats
Please read _*all*_ the answers, because I've already said that I was fine with keeping COMPAT_ULTRIX. If someone has been able to even run a full ULTRIX userland, then it's probably functional enough, and we can keep it for the time being. Case closed.