Re: Audio buffer positions
dam...@zamaudio.com (Damien Zammit) writes: >Can we use the arrival of audio IRQs to trigger the measurement of >current buffer positions and high resolution timer timestamp combined with >system time, >and use these measurements to synchronise a delay-locked-loop (DLL) >to provide sub-sample accurate estimates of buffer position in between the >arrival of audio IRQs? Not sure what a "sub-sample" buffer position would be. Currently you get the position with "block" precision, and for regular hardware that's "sampling the buffer pointer in the audio interrupt". The "block" size is automatically chosen according to the configured latency parameter (within bounds), the default precision is 40ms. Reduce it to 4ms (with sysctl) and the result is good enough for anything and the overhead is still rather small (does a VAX have audio?). What you could do is interpolate the buffer pointer using the system time and the configured bitrate, you need to be careful to keep it monotonic. Here is a patch that does this: http://ftp.netbsd.org/pub/NetBSD/misc/mlelstv/audio.diff It adds a sysctl node that lets you chose 3 modes to calculate the buffer position. 0 - report the buffer start 1 - interpolate between start and end using time / bitrate 2 - report the buffer end (that's the unpatched behaviour) The interpolation mode gets you precise timing... if your application uses the buffer offset correctly. But it won't help for others that just want precise wakeup times. For these you actually need the smaller buffers. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: Why NetBSD x86's bus_space_barrier does not use [sml]fence?
Thank you for your teaching me about the backgrounds. And, I could test the driver applying the related changes. Thanks, yamaguchi
Audio buffer positions
Hi there, I would like to suggest an idea for improving the latency reporting to userland for audio api: Can we use the arrival of audio IRQs to trigger the measurement of current buffer positions and high resolution timer timestamp combined with system time, and use these measurements to synchronise a delay-locked-loop (DLL) to provide sub-sample accurate estimates of buffer position in between the arrival of audio IRQs? In this way, the AUDIO_GETIOFFS and AUDIO_GETOOFFS ioctls could return sub-sample accurate buffer positions. I don't think the external facing api would even need to change at all. -- Damien Zammit ZamAudio
Re: modules item #14 revisited
In article <20191207024224.1b0d417f...@rebar.astron.com>, Christos Zoulas wrote: > >Hi, > >This is a quick and dirty implementation of: > >http://mail-index.NetBSD.org/current-users/2009/05/10/msg009372.html > >to use: >$ echo KERNEL_DIR=yes >> /etc/mk.conf ># apply the enclosed patch >$ mv /netbsd{,.old} >$ mkdir -p /netbsd/modules $ make && make install in src/share/mk ># build a new kernel and put it in /netbsd/kernel >$ make && make install in src/sys/modules >$ make && make install in sys/arch/i386/stand/boot >$ cp /usr/mdec/boot / > >There are quite a few open issues: >- All ports need to provide get_booted_kernel() > Are there ports where we can't get the booted kernel name? What do we do > there? >- Make install in modules always overwrites the modules in /netbsd/ >- Does each kernel.tgz need to pack all the modules in the sets or > do we keep the modules.tgz that unpacks in /netbsd/modules? >- How about the other kernels? do they unpack in /KERNEL-NAME/kernel? >- Right now there is fallback code in the boot blocks for just the > kernel but not the old module path (this is intentional to keep things > simple). You can always boot a kernel by name from a file. > >christos > >Index: share/mk/bsd.kmodule.mk >=== >RCS file: /cvsroot/src/share/mk/bsd.kmodule.mk,v >retrieving revision 1.63 >diff -u -p -u -r1.63 bsd.kmodule.mk >--- share/mk/bsd.kmodule.mk1 Dec 2019 20:24:47 - 1.63 >+++ share/mk/bsd.kmodule.mk7 Dec 2019 02:26:01 - >@@ -172,7 +172,13 @@ ${PROG}: ${OBJS} ${DPADD} ${KMODSCRIPT} > # Install rules > .if !target(kmodinstall) > .if !defined(KMODULEDIR) >+.if ${KERNEL_DIR:Uno} == "yes" >+KMODULEDIR= ${DESTDIR}/netbsd/modules/${KMOD} >+_INST_DIRS= ${DESTDIR}/netbsd >+_INST_DIRS+= ${DESTDIR}/netbsd/modules >+_INST_DIRS+= ${DESTDIR}/netbsd/modules/${KMOD} > _OSRELEASE!= ${HOST_SH} $S/conf/osrelease.sh -k >+.else > # Ensure these are recorded properly in METALOG on unprived installes: > KMODULEARCHDIR?= ${MACHINE} > _INST_DIRS= ${DESTDIR}/stand/${KMODULEARCHDIR} >@@ -180,6 +186,7 @@ _INST_DIRS+= ${DESTDIR}/stand/${KMODULEA > _INST_DIRS+= ${DESTDIR}/stand/${KMODULEARCHDIR}/${_OSRELEASE}/modules > KMODULEDIR= ${DESTDIR}/stand/${KMODULEARCHDIR}/${_OSRELEASE}/modules/${KMOD} > .endif >+.endif > _PROG:= ${KMODULEDIR}/${PROG} # installed path > > .if ${MKUPDATE} == "no" >Index: sys/arch/i386/stand/boot/Makefile.boot >=== >RCS file: /cvsroot/src/sys/arch/i386/stand/boot/Makefile.boot,v >retrieving revision 1.73 >diff -u -p -u -r1.73 Makefile.boot >--- sys/arch/i386/stand/boot/Makefile.boot 13 Sep 2019 02:19:45 - >1.73 >+++ sys/arch/i386/stand/boot/Makefile.boot 7 Dec 2019 02:26:01 - >@@ -53,6 +53,10 @@ CFLAGS+= -Wall -Wmissing-prototypes -Wst > CPPFLAGS+= -nostdinc -D_STANDALONE > CPPFLAGS+= -I$S > >+.if ${KERNEL_DIR:Uno} == "yes" >+CPPFLAGS+= -DKERNEL_DIR >+.endif >+ > CPPFLAGS+= -DSUPPORT_PS2 > CPPFLAGS+= -DDIRECT_SERIAL > CPPFLAGS+= -DSUPPORT_SERIAL=boot_params.bp_consdev >Index: sys/arch/i386/stand/boot/boot2.c >=== >RCS file: /cvsroot/src/sys/arch/i386/stand/boot/boot2.c,v >retrieving revision 1.72 >diff -u -p -u -r1.72 boot2.c >--- sys/arch/i386/stand/boot/boot2.c 2 Sep 2019 06:10:24 - 1.72 >+++ sys/arch/i386/stand/boot/boot2.c 7 Dec 2019 02:26:01 - >@@ -279,6 +279,12 @@ bootit(const char *filename, int howto) > if (howto & AB_VERBOSE) > printf("booting %s (howto 0x%x)\n", sprint_bootsel(filename), > howto); >+#ifdef KERNEL_DIR >+ char path[512]; >+ strcpy(path, filename); >+ strcat(path, "/kernel"); >+ (void)exec_netbsd(path, 0, howto, boot_biosdev < 0x80, clearit); >+#endif > > if (exec_netbsd(filename, 0, howto, boot_biosdev < 0x80, clearit) < 0) > printf("boot: %s: %s\n", sprint_bootsel(filename), >Index: sys/arch/i386/stand/lib/exec.c >=== >RCS file: /cvsroot/src/sys/arch/i386/stand/lib/exec.c,v >retrieving revision 1.74 >diff -u -p -u -r1.74 exec.c >--- sys/arch/i386/stand/lib/exec.c 13 Sep 2019 02:19:46 - 1.74 >+++ sys/arch/i386/stand/lib/exec.c 7 Dec 2019 02:26:01 - >@@ -151,7 +151,7 @@ static voidmodule_add_common(const char > static void userconf_init(void); > > static void extract_device(const char *, char *, size_t); >-static void module_base_path(char *, size_t); >+static void module_base_path(char *, size_t, const char *); > static intmodule_open(boot_module_t *, int, const char *, const char *, > bool); > >@@ -653,8 +653,18 @@ module_open(boot_module_t *bm, int mode, > } > > static void >-module_base_path(char *buf, size_t bufsize) >+module_base_path(char *buf, size_t bufsize, const char
modules item #14 revisited
Hi, This is a quick and dirty implementation of: http://mail-index.NetBSD.org/current-users/2009/05/10/msg009372.html to use: $ echo KERNEL_DIR=yes >> /etc/mk.conf # apply the enclosed patch $ mv /netbsd{,.old} $ mkdir -p /netbsd/modules # build a new kernel and put it in /netbsd/kernel $ make && make install in src/sys/modules $ make && make install sys/arch/i386/stand/boot $ cp /usr/mdec/boot / There are quite a few open issues: - All ports need to provide get_booted_kernel() Are there ports where we can't get the booted kernel name? What do we do there? - Make install in modules always overwrites the modules in /netbsd/ - Does each kernel.tgz need to pack all the modules in the sets or do we keep the modules.tgz that unpacks in /netbsd/modules? - How about the other kernels? do they unpack in /KERNEL-NAME/kernel? - Right now there is fallback code in the boot blocks for just the kernel but not the old module path (this is intentional to keep things simple). You can always boot a kernel by name from a file. christos Index: share/mk/bsd.kmodule.mk === RCS file: /cvsroot/src/share/mk/bsd.kmodule.mk,v retrieving revision 1.63 diff -u -p -u -r1.63 bsd.kmodule.mk --- share/mk/bsd.kmodule.mk 1 Dec 2019 20:24:47 - 1.63 +++ share/mk/bsd.kmodule.mk 7 Dec 2019 02:26:01 - @@ -172,7 +172,13 @@ ${PROG}: ${OBJS} ${DPADD} ${KMODSCRIPT} # Install rules .if !target(kmodinstall) .if !defined(KMODULEDIR) +.if ${KERNEL_DIR:Uno} == "yes" +KMODULEDIR=${DESTDIR}/netbsd/modules/${KMOD} +_INST_DIRS=${DESTDIR}/netbsd +_INST_DIRS+= ${DESTDIR}/netbsd/modules +_INST_DIRS+= ${DESTDIR}/netbsd/modules/${KMOD} _OSRELEASE!= ${HOST_SH} $S/conf/osrelease.sh -k +.else # Ensure these are recorded properly in METALOG on unprived installes: KMODULEARCHDIR?= ${MACHINE} _INST_DIRS=${DESTDIR}/stand/${KMODULEARCHDIR} @@ -180,6 +186,7 @@ _INST_DIRS+=${DESTDIR}/stand/${KMODULEA _INST_DIRS+= ${DESTDIR}/stand/${KMODULEARCHDIR}/${_OSRELEASE}/modules KMODULEDIR=${DESTDIR}/stand/${KMODULEARCHDIR}/${_OSRELEASE}/modules/${KMOD} .endif +.endif _PROG:=${KMODULEDIR}/${PROG} # installed path .if ${MKUPDATE} == "no" Index: sys/arch/i386/stand/boot/Makefile.boot === RCS file: /cvsroot/src/sys/arch/i386/stand/boot/Makefile.boot,v retrieving revision 1.73 diff -u -p -u -r1.73 Makefile.boot --- sys/arch/i386/stand/boot/Makefile.boot 13 Sep 2019 02:19:45 - 1.73 +++ sys/arch/i386/stand/boot/Makefile.boot 7 Dec 2019 02:26:01 - @@ -53,6 +53,10 @@ CFLAGS+= -Wall -Wmissing-prototypes -Wst CPPFLAGS+= -nostdinc -D_STANDALONE CPPFLAGS+= -I$S +.if ${KERNEL_DIR:Uno} == "yes" +CPPFLAGS+= -DKERNEL_DIR +.endif + CPPFLAGS+= -DSUPPORT_PS2 CPPFLAGS+= -DDIRECT_SERIAL CPPFLAGS+= -DSUPPORT_SERIAL=boot_params.bp_consdev Index: sys/arch/i386/stand/boot/boot2.c === RCS file: /cvsroot/src/sys/arch/i386/stand/boot/boot2.c,v retrieving revision 1.72 diff -u -p -u -r1.72 boot2.c --- sys/arch/i386/stand/boot/boot2.c2 Sep 2019 06:10:24 - 1.72 +++ sys/arch/i386/stand/boot/boot2.c7 Dec 2019 02:26:01 - @@ -279,6 +279,12 @@ bootit(const char *filename, int howto) if (howto & AB_VERBOSE) printf("booting %s (howto 0x%x)\n", sprint_bootsel(filename), howto); +#ifdef KERNEL_DIR + char path[512]; + strcpy(path, filename); + strcat(path, "/kernel"); + (void)exec_netbsd(path, 0, howto, boot_biosdev < 0x80, clearit); +#endif if (exec_netbsd(filename, 0, howto, boot_biosdev < 0x80, clearit) < 0) printf("boot: %s: %s\n", sprint_bootsel(filename), Index: sys/arch/i386/stand/lib/exec.c === RCS file: /cvsroot/src/sys/arch/i386/stand/lib/exec.c,v retrieving revision 1.74 diff -u -p -u -r1.74 exec.c --- sys/arch/i386/stand/lib/exec.c 13 Sep 2019 02:19:46 - 1.74 +++ sys/arch/i386/stand/lib/exec.c 7 Dec 2019 02:26:01 - @@ -151,7 +151,7 @@ static void module_add_common(const char static voiduserconf_init(void); static voidextract_device(const char *, char *, size_t); -static voidmodule_base_path(char *, size_t); +static voidmodule_base_path(char *, size_t, const char *); static int module_open(boot_module_t *, int, const char *, const char *, bool); @@ -653,8 +653,18 @@ module_open(boot_module_t *bm, int mode, } static void -module_base_path(char *buf, size_t bufsize) +module_base_path(char *buf, size_t bufsize, const char *kernel_path) { +#ifdef KERNEL_DIR + /* we cheat here, because %.* does not work with the mini printf */ + char *ptr = strrchr(kernel_path, '/'); + if (*ptr) + *ptr = '\0'; + +
Re: racy acccess in kern_runq.c
Please educate me. It’s been a while for me. Writing Kernel code *requires* knowledge of what code is generated sometimes. In my experience, there have been standard techniques, like pragmas and insertions of assembly code to suppress this sort of undesirable optimization. Don’t those techniques exist any more? My compiler friends used to put them in for just this purpose, and they tried to make them as portable as possible. Surely GCC does this. No? Inquiring minds… -dgl- > On Dec 6, 2019, at 1:44 PM, > wrote: > > > >> On Dec 6, 2019, at 10:21 AM, Mouse wrote: >> >> >> [EXTERNAL EMAIL] >> >>> Compilers have became much more aggressive over the years. But they >>> are allowed to be so by the C standard. Specifically, in addition to >>> code-level re-ordering, plain accesses (loads/stores) are subject to >>> load/store fusing, tearing as well as invented loads/stores. >> >> Then, honestly, it sounds to me as though "the latest revision of C" is >> no longer an appropriate language for writing kernels. I see no reason >> to twist the kernel code into a pretzel to work around latitude a new >> language gives to its compilers - and that's what C11 is, a new >> language, albeit one closely related to various previous languages. >> >> One of the prime considerations when choosing a language and/or >> compiler for building a kernel is that it produce relatively >> predictable code, for exactly this kind of reason. If the latest C and >> the latest gcc no longer do that, then IMO they are no longer >> appropriate for writing/compiling kernels. > > C11 isn't all that new, of course. And I don't know if the rules about > optimization that you object to are anywhere near that new in any case. What > seems to be the case instead is that compilers are more and more doing what > the language has for a long time allowed them to do. > > Consider for example "Undefined behavior -- what happened to my code?" by > Wang et al. (MIT and Tsinghua University). It describes all sorts of allowed > transformations that come as a surprise to programmers. > > Yes, it certainly would be possible to create a programming language with > less surprising semantics. C is not in any way shape or form a clean > language with clean semantics. But as for wanting predictable code, that is > certainly doable. Unfortunately, it requires a sufficiently deep > understanding of the rules of the language. Typical textbooks are not all > that good for this, they are too superficial. The language standard tends to > be better. But again, a problem with most programming languages is that > their semantics are not well defined and/or much more complex than they > should be. Programmming in C++ is particularly scary for this reason, but C > is also problematic. For clean semantics, I like ALGOL; too bad it is no > longer used, though it did at one time serve to build successful operating > systems. > > paul >
Re: Changes to reduce pressure on uvm_pageqlock
Ok here's is a more complete patch for the change to uvm_pageqlock. I have edited by hand to remove lots of unrelated changes so it's to get a picture of the changes - it won't apply or compile. http://www.netbsd.org/~ad/2019/partial.diff (It's time consuming to detangle this from the rest of my changes which are per-CPU stat collection, a topology-aware page allocator that dispenses with the "per-cpu" lists we have now, and yamt's radixtree). There are a few more things I want to look at tomorrow including the PQ_READAHEAD flag (it's not safe now) whether the pagedaemon needs to know about the pdpolicy lock at all, and a last go over it all checking everything. The earlier changes to vm_page I mentioned for alignment and the page replacement policy I am going to postpone because radixtree changes the picture bigtime and more testing is required. (That's a separate review.) As to the reason why: at the start of November on my machine system time for a kernel build was 3200-3300s. With this plus the remaining changes it's down to 850s so far and with !DIAGNOSTIC about 580s. Andrew On Fri, Dec 06, 2019 at 04:34:50PM +, Andrew Doran wrote: > I ended up taking a disliking to these changes so I went back and took > another look at the problem yesterday evening. > > There are two distinct sets of data involved that need protection: page > identity (connected anon, uobj, wiring, loaning) and page replacement state. > > There are limited points of connection between those two sets, so it's easy > to do two locks. We can give the pdpolicy code a lock for its state and let > it manage that lock, which only it and the pagedaemon need to know about. > The lock can be private to the file (uvm_pdpolicy_clock / clockpro.c), with > a couple of hooks for the pagedaemon to acquire and release it. > > With that done, we'd then only need one version of uvm_pageactivate() etc, > because those functions will no longer care about uvm_pageqlock. > > I tried it out: the code looks more elegant and the scheme works in practice > with contention on uvm_pageqlock now quite minor, and the sum of contention > on the two locks lower than it was on uvm_pageqlock prior. > > With the locking for the pdpolicy code being private to the file I think it > would then be easier to experiment with different strategies to reduce > contention further. > > http://www.netbsd.org/~ad/2019/pdpolicy.diff > > Andrew > > On Tue, Dec 03, 2019 at 12:41:25AM +, Andrew Doran wrote: > > Hi, > > > > I have some straightforward changes for uvm to improve the situation on > > many-CPU machines. I'm going to break them into pieces to make them easier > > to review (only this first piece and what's already in CVS is ready). > > > > I have carefuly measured the impact of these over hundreds of kernel builds, > > using lockstat, tprof and some custom instrumentation so I'm confident that > > for each, the effects at least are of value. > > > > Anyway I'd be grateful if someone could take a look. This one is about > > reducing pressure on uvm_pageqlock, and cache misses on struct vm_page. > > > > Cheers, > > Andrew > > > > > > http://www.netbsd.org/~ad/2019/uvm1.diff > > > > > > vm_page: cluster largely static fields used during page lookup in the first > > 64-bytes. Increase wire_count to 32-bits, and add a field for use of the > > page replacement policy. This leaves 2 bytes spare in a word locked by > > uvm_fpageqlock/uvm_pageqlock which I want to use later for changes to the > > page allocator. It also brings vm_page up to 128 bytes on amd64. > > > > New functions: > > > > => uvmpdpol_pageactivate_p() > > > > For page replacement policy. Returns true if pdpol thinks activation info > > would be useful enough to cause disruption to page queues, vm_page and > > uvm_fpageqlock. For CLOCK this returns true if page is not active, or was > > not activated within the last second. > > > > => uvm_pageenqueue1() > > > > Call without uvm_pageqlock. Acquires the lock and enqueues the page only > > if not already enqueued. > > > > => uvm_pageactivate1() > > > > Call without uvm_pageqlock. Acquires the lock and activates the page only > > if uvmpdpol_pageactivate_p() says yes. No similar change for deactivate nor > > dequeue as they are much more definite. > > > > => uvm_pagefree1() > > > > First part of uvm_pagefree() - strip page of identity. Requires > > uvm_pageqlock if associated with an object. > > > > => uvm_pagefree2() > > > > Second part of uvm_pagefree(). Send page to free list. No locks required.
Re: racy acccess in kern_runq.c
> On Dec 6, 2019, at 11:44 AM, paul.kon...@dell.com wrote: > > For clean semantics, I like ALGOL; too bad it is no longer used There's just too much shouting in ALGOL. -- thorpej
Re: racy acccess in kern_runq.c
> On Dec 6, 2019, at 10:21 AM, Mouse wrote: > > > [EXTERNAL EMAIL] > >> Compilers have became much more aggressive over the years. But they >> are allowed to be so by the C standard. Specifically, in addition to >> code-level re-ordering, plain accesses (loads/stores) are subject to >> load/store fusing, tearing as well as invented loads/stores. > > Then, honestly, it sounds to me as though "the latest revision of C" is > no longer an appropriate language for writing kernels. I see no reason > to twist the kernel code into a pretzel to work around latitude a new > language gives to its compilers - and that's what C11 is, a new > language, albeit one closely related to various previous languages. > > One of the prime considerations when choosing a language and/or > compiler for building a kernel is that it produce relatively > predictable code, for exactly this kind of reason. If the latest C and > the latest gcc no longer do that, then IMO they are no longer > appropriate for writing/compiling kernels. C11 isn't all that new, of course. And I don't know if the rules about optimization that you object to are anywhere near that new in any case. What seems to be the case instead is that compilers are more and more doing what the language has for a long time allowed them to do. Consider for example "Undefined behavior -- what happened to my code?" by Wang et al. (MIT and Tsinghua University). It describes all sorts of allowed transformations that come as a surprise to programmers. Yes, it certainly would be possible to create a programming language with less surprising semantics. C is not in any way shape or form a clean language with clean semantics. But as for wanting predictable code, that is certainly doable. Unfortunately, it requires a sufficiently deep understanding of the rules of the language. Typical textbooks are not all that good for this, they are too superficial. The language standard tends to be better. But again, a problem with most programming languages is that their semantics are not well defined and/or much more complex than they should be. Programmming in C++ is particularly scary for this reason, but C is also problematic. For clean semantics, I like ALGOL; too bad it is no longer used, though it did at one time serve to build successful operating systems. paul
Re: racy acccess in kern_runq.c
On Fri, Dec 06, 2019 at 09:00:33AM +, Andrew Doran wrote: > > Please don't commit this. These accesses are racy by design. There is no > safety issue and we do not want to disturb the activity of other CPUs in > this code path by locking them. We also don't want to use atomics either. I'm sorry, at this point I can't help myself: /* You are not expected to understand this. */ Thor
Re: racy acccess in kern_runq.c
Maxime Villard wrote: > Le 06/12/2019 à 17:53, Andrew Doran a écrit : > > On Fri, Dec 06, 2019 at 05:22:39PM +0900, Kengo NAKAHARA wrote: > > > >>/* Update the worker */ > >> - worker_ci = hci; > >> + atomic_swap_ptr(_ci, hci); > > > > Why atomic_swap_ptr() not atomic_store_relaxed()? I don't see any bug > > that it fixes. Other than that it look OK to me. > > Because I suggested it; my concern was that if not explicitly atomic, the > cpu could make two writes internally (even though the compiler produces > only one instruction), and in that case a page fault would have been > possible because of garbage dereference. No, atomic_store_relaxed() is sufficient; that is what it is for. -- Mindaugas
Re: racy acccess in kern_runq.c
On Fri, Dec 06, 2019 at 06:33:32PM +0100, Maxime Villard wrote: > Le 06/12/2019 à 17:53, Andrew Doran a écrit : > >Why atomic_swap_ptr() not atomic_store_relaxed()? I don't see any bug that > >it fixes. Other than that it look OK to me. > > Because I suggested it; my concern was that if not explicitly atomic, the > cpu could make two writes internally (even though the compiler produces > only one instruction), and in that case a page fault would have been possible > because of garbage dereference. atomic_store_relaxed() is not explicitly atomic? Dave -- David Young dyo...@pobox.comUrbana, IL(217) 721-9981
Re: racy acccess in kern_runq.c
Le 06/12/2019 à 17:53, Andrew Doran a écrit : On Fri, Dec 06, 2019 at 05:22:39PM +0900, Kengo NAKAHARA wrote: /* Update the worker */ - worker_ci = hci; + atomic_swap_ptr(_ci, hci); Why atomic_swap_ptr() not atomic_store_relaxed()? I don't see any bug that it fixes. Other than that it look OK to me. Because I suggested it; my concern was that if not explicitly atomic, the cpu could make two writes internally (even though the compiler produces only one instruction), and in that case a page fault would have been possible because of garbage dereference. To be honest, I'm not totally sure whether it is a valid concern; theoretically, it is.
Re: racy acccess in kern_runq.c
On Fri, Dec 06, 2019 at 05:22:39PM +0900, Kengo NAKAHARA wrote: > /* Update the worker */ > - worker_ci = hci; > + atomic_swap_ptr(_ci, hci); Why atomic_swap_ptr() not atomic_store_relaxed()? I don't see any bug that it fixes. Other than that it look OK to me. Cheers, Andrew
Re: Changes to reduce pressure on uvm_pageqlock
I ended up taking a disliking to these changes so I went back and took another look at the problem yesterday evening. There are two distinct sets of data involved that need protection: page identity (connected anon, uobj, wiring, loaning) and page replacement state. There are limited points of connection between those two sets, so it's easy to do two locks. We can give the pdpolicy code a lock for its state and let it manage that lock, which only it and the pagedaemon need to know about. The lock can be private to the file (uvm_pdpolicy_clock / clockpro.c), with a couple of hooks for the pagedaemon to acquire and release it. With that done, we'd then only need one version of uvm_pageactivate() etc, because those functions will no longer care about uvm_pageqlock. I tried it out: the code looks more elegant and the scheme works in practice with contention on uvm_pageqlock now quite minor, and the sum of contention on the two locks lower than it was on uvm_pageqlock prior. With the locking for the pdpolicy code being private to the file I think it would then be easier to experiment with different strategies to reduce contention further. http://www.netbsd.org/~ad/2019/pdpolicy.diff Andrew On Tue, Dec 03, 2019 at 12:41:25AM +, Andrew Doran wrote: > Hi, > > I have some straightforward changes for uvm to improve the situation on > many-CPU machines. I'm going to break them into pieces to make them easier > to review (only this first piece and what's already in CVS is ready). > > I have carefuly measured the impact of these over hundreds of kernel builds, > using lockstat, tprof and some custom instrumentation so I'm confident that > for each, the effects at least are of value. > > Anyway I'd be grateful if someone could take a look. This one is about > reducing pressure on uvm_pageqlock, and cache misses on struct vm_page. > > Cheers, > Andrew > > > http://www.netbsd.org/~ad/2019/uvm1.diff > > > vm_page: cluster largely static fields used during page lookup in the first > 64-bytes. Increase wire_count to 32-bits, and add a field for use of the > page replacement policy. This leaves 2 bytes spare in a word locked by > uvm_fpageqlock/uvm_pageqlock which I want to use later for changes to the > page allocator. It also brings vm_page up to 128 bytes on amd64. > > New functions: > > => uvmpdpol_pageactivate_p() > > For page replacement policy. Returns true if pdpol thinks activation info > would be useful enough to cause disruption to page queues, vm_page and > uvm_fpageqlock. For CLOCK this returns true if page is not active, or was > not activated within the last second. > > => uvm_pageenqueue1() > > Call without uvm_pageqlock. Acquires the lock and enqueues the page only > if not already enqueued. > > => uvm_pageactivate1() > > Call without uvm_pageqlock. Acquires the lock and activates the page only > if uvmpdpol_pageactivate_p() says yes. No similar change for deactivate nor > dequeue as they are much more definite. > > => uvm_pagefree1() > > First part of uvm_pagefree() - strip page of identity. Requires > uvm_pageqlock if associated with an object. > > => uvm_pagefree2() > > Second part of uvm_pagefree(). Send page to free list. No locks required.
Re: racy acccess in kern_runq.c
On Fri, Dec 06, 2019 at 02:55:47PM +, Mindaugas Rasiukevicius wrote: > Compilers have became much more aggressive over the years. But they are > allowed to be so by the C standard. Specifically, in addition to code-level > re-ordering, Yup the rules around that are well understood. > plain accesses (loads/stores) are subject to load/store fusing, tearing as > well as invented loads/stores. At least load/store fusing and tearing > *have* been observed in reality [1] [2] [3]. So, they are *not* merely > theoretical or esoteric problems, although there are plenty of these in > the C11 memory model too [4] [5]. Thank you for the references, very interesting - if distressing - reading. So it does happen. > Linux kernel developers went through this already. Perhaps the C standard > will plug the holes or perhaps compilers will just change their behaviour, > as they get enough criticism [6] [7]. However, in the mean time, I think > let's just accept that things moved on and let's embrace the new primitives. > While these primitives might be slightly verbose, they are in C11, they fix > real bugs, they definitely make code less error-prone and they have other > merits too (e.g. they accommodate static analysers which find some real bugs). Well I'm firmly of the opinion that this is a bug in the compiler. By all means a nice behaviour to have as an option and I very much see the use of it, but having it as the default for compiling old code is bad news indeed. No argument from me on accommodating static analysers I think that's a good thing. Presumably then we have no other choice than to work around it, because even if we can tell the compiler not to be a jerk today, that option won't be feasible to use in the future. Thanks again, Andrew
Re: [patch] PT_GET_LWP_PRIVATE and PT_SET_LWP_PRIVATE
On 06.12.2019 06:17, Kamil Rytarowski wrote: > I have implemented l_private accessor in ptrace(2). > > By default this uses l_private from the lwp struct. > > PT_SET_LWP_PRIVATE uses lwp_setprivate() directly and this call > abstracts internally MI and MD code. > > The PT_SET_LWP_PRIVATE operation uses by default l_private from the > struct lwp, however whenever appropriate, picks MD logic that grabs this > value from a MD filed (register or pcb). > > MD code is implemented for: > - sparc, sparc64 > - alpha > - hppa > - powerpc > - sh3 > > There is included compat32 support and ATF tests. > > http://netbsd.org/~kamil/patch-00202-PT_GET_LWP_PRIVATE.txt > > This ptrace operation is inspired by PTRACE_GET_THREAD_AREA / > PTRACE_SET_THREAD_AREA from linux. > > Please review. > > After merging this, I will add an entry in man-page ptrace(2) and > implement core(5) generation of PT_GET_LWP_PRIVATE for each LWP within a > process. > Actually, we can go better here and in one go remove one of the issues in our current ptrace(2) API regarding legacy operation PT_LWPINFO (it is obsoleted by PT_GET_SIGINFO and confusing with a different PT_LWPINFO found in FreeBSD). I will come up with another and better patch. signature.asc Description: OpenPGP digital signature
Re: racy acccess in kern_runq.c
> Compilers have became much more aggressive over the years. But they > are allowed to be so by the C standard. Specifically, in addition to > code-level re-ordering, plain accesses (loads/stores) are subject to > load/store fusing, tearing as well as invented loads/stores. Then, honestly, it sounds to me as though "the latest revision of C" is no longer an appropriate language for writing kernels. I see no reason to twist the kernel code into a pretzel to work around latitude a new language gives to its compilers - and that's what C11 is, a new language, albeit one closely related to various previous languages. One of the prime considerations when choosing a language and/or compiler for building a kernel is that it produce relatively predictable code, for exactly this kind of reason. If the latest C and the latest gcc no longer do that, then IMO they are no longer appropriate for writing/compiling kernels. > While these primitives might be slightly verbose, they are in C11, > they fix real bugs, they definitely make code less error-prone and > they have other merits too (e.g. they accommodate static analysers > which find some real bugs). How many of those "real bugs" exist only because C11 gave compilers new latitude to produce unexpected code? /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: racy acccess in kern_runq.c
Le 06/12/2019 à 11:28, Andrew Doran a écrit : > On Fri, Dec 06, 2019 at 10:27:20AM +0100, Maxime Villard wrote: > >> With 'worker_ci', there is an actual safety issue, because the compiler could >> split the accesses and the hardware may not use atomics by default like x86. >> This could cause random page faults; so it needs to be strictly atomic. > > No I don't accept that. > > The ability to load and store a native word sized int (and in more recent > years a pointer) with a single instruction is a fundamental assumption that > every operating system written in C rests upon. 1) That it is done with a single instruction, does not necessarily mean it is atomic. There are many stupid things microcodes are allowed to do (not on x86, however). 2) That there is a one-line assignment of a variable in C, does not necessarily mean that there will be one move instruction. There are many things the compiler is allowed to do in optimization passes, as said already. 3) "every operating system written in C rests upon" I have some difficulty with that. Linux has our equivalent of relaxed atomics (called {READ,WRITE}_ONCE), with 1000+ combined references in their tree. So no, here's a big open source kernel that does not rest upon unclear assumptions for lockless operations. No way to check Windows, but I wouldn't be surprised if they had a similar approach. > If the compiler splits one of those acceses, then you are either using some > other data type, or have a broken compiler on your hands. Or you're just using a compiler that knows well enough how to arrange instructions for optimized pipelining, and this compiler decides to re-order the accesses based on desirable performance properties, as allowed by the standard. > https://www.gnu.org/software/libc/manual/html_node/Atomic-Types.html I don't see anything in this link that is really reassuring. "you can assume" that things are mostly ok on "systems we know of". At least the rationale for {READ,WRITE}_ONCE is based on standards, and not wild assumptions: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE To me, Kengo's 2nd patch remains needed. Maxime
Re: racy acccess in kern_runq.c
On Fri, Dec 06, 2019 at 10:27:20AM +0100, Maxime Villard wrote: > With 'worker_ci', there is an actual safety issue, because the compiler could > split the accesses and the hardware may not use atomics by default like x86. > This could cause random page faults; so it needs to be strictly atomic. No I don't accept that. The ability to load and store a native word sized int (and in more recent years a pointer) with a single instruction is a fundamental assumption that every operating system written in C rests upon. If the compiler splits one of those acceses, then you are either using some other data type, or have a broken compiler on your hands. If the compiler is broken it's the compiler you should be looking at, not the program it compiled. It's as simple as that. https://www.gnu.org/software/libc/manual/html_node/Atomic-Types.html Andrew
Re: racy acccess in kern_runq.c
Hi, On 2019/12/06 18:00, Andrew Doran wrote: Hi, On Fri, Dec 06, 2019 at 03:52:23PM +0900, Kengo NAKAHARA wrote: There are some racy accesses in kern_runq.c detected by KCSAN. Those racy access messages is so frequency that they cover other messages, so I want to fix them. They can be fixed by the following patch. Please don't commit this. These accesses are racy by design. There is no safety issue and we do not want to disturb the activity of other CPUs in this code path by locking them. We also don't want to use atomics either. This code path is extremely hot using atomics would impose a severe performance penalty on the system under certain workloads. I see, I don't commit it. Indeed, I am worry about performance degradation. Also if you change this to use strong ordering, you're quite likely to change the selection behaviour of LWPs. This is something delicate that exhibits reasonable scheduling behaviour in large part through randomness i.e by accident, so serialising it it's likely to have strong effects on how LWPs are distributed. Lastly I have a number of experimental changes to this code which I'll be committing in the near future allowing people to change the LWP selection policy to see how if we can improve performance under different workloads. They will also likely show up in KCSAN as racy/dangerous accesses. My suggestion is to find a way to teach KCSAN that we know something is racy, we like it being racy, and that it's not a problem, so that it no longer shows up in the KCSAN output. I see. I will add __nocsan attribute to the functions in my local sources to find other races. Thanks, -- // Internet Initiative Japan Inc. Device Engineering Section, Product Development Department, Product Division, Technology Unit Kengo NAKAHARA
Re: racy acccess in kern_runq.c
Le 06/12/2019 à 10:00, Andrew Doran a écrit : > Hi, > > On Fri, Dec 06, 2019 at 03:52:23PM +0900, Kengo NAKAHARA wrote: > >> There are some racy accesses in kern_runq.c detected by KCSAN. Those >> racy access messages is so frequency that they cover other messages, >> so I want to fix them. They can be fixed by the following patch. > > Please don't commit this. These accesses are racy by design. There is no > safety issue and we do not want to disturb the activity of other CPUs in > this code path by locking them. We also don't want to use atomics either. > This code path is extremely hot using atomics would impose a severe > performance penalty on the system under certain workloads. With 'worker_ci', there is an actual safety issue, because the compiler could split the accesses and the hardware may not use atomics by default like x86. This could cause random page faults; so it needs to be strictly atomic. Apart from that, yes, there is no other safety issue and locking would be too expensive. All we possibly care about is making sure the accesses aren't split, for the sake of not basing the scheduling policy on systematic garbage, and atomic_relaxed seems like the right thing to do (Kengo's 2nd patch), considering that it costs ~nothing. > Also if you change this to use strong ordering, you're quite likely to > change the selection behaviour of LWPs. This is something delicate that > exhibits reasonable scheduling behaviour in large part through randomness > i.e by accident, so serialising it it's likely to have strong effects on how > LWPs are distributed. > > Lastly I have a number of experimental changes to this code which I'll be > committing in the near future allowing people to change the LWP selection > policy to see how if we can improve performance under different workloads. > They will also likely show up in KCSAN as racy/dangerous accesses. > > My suggestion is to find a way to teach KCSAN that we know something is > racy, we like it being racy, and that it's not a problem, so that it no > longer shows up in the KCSAN output. Maybe we should indeed have a macro to say "yes this access is racy but we don't care". Currently this macro is atomic_{store,load}_relaxed()...
Re: racy acccess in kern_runq.c
Le 06/12/2019 à 07:52, Kengo NAKAHARA a écrit : > Hi, > > There are some racy accesses in kern_runq.c detected by KCSAN. Those > racy access messages is so frequency that they cover other messages, > so I want to fix them. They can be fixed by the following patch. > > > diff --git a/sys/kern/kern_runq.c b/sys/kern/kern_runq.c > index 04ff1732016..bb0815689cd 100644 > --- a/sys/kern/kern_runq.c > +++ b/sys/kern/kern_runq.c > @@ -627,7 +627,7 @@ sched_balance(void *nocallout) > /* Make lockless countings */ > for (CPU_INFO_FOREACH(cii, ci)) { > spc = >ci_schedstate; > - > + spc_lock(ci); > /* > * Average count of the threads > * > @@ -643,10 +643,11 @@ sched_balance(void *nocallout) > hci = ci; > highest = spc->spc_avgcount; > } > + spc_unlock(ci); > } > > /* Update the worker */ > - worker_ci = hci; > + atomic_swap_ptr(_ci, hci); > > if (nocallout == NULL) > callout_schedule(_ch, balance_period); > @@ -734,12 +735,15 @@ sched_idle(void) > spc_unlock(ci); > > no_migration: > + spc_lock(ci); > if ((spc->spc_flags & SPCF_OFFLINE) != 0 || spc->spc_count != 0) { > + spc_unlock(ci); > return; > } > > /* Reset the counter, and call the balancer */ > spc->spc_avgcount = 0; > + spc_unlock(ci); > sched_balance(ci); > tci = worker_ci; > tspc = >ci_schedstate; > It just so happens we were talking about this yesterday. worker_ci indeed needs to be a real atomic, but it means we also have to fetch it atomically here: 744 tci = worker_ci; For the other variables, my understanding was that we don't care about races, but only care about making sure the accesses are not split, so it seems to me atomic_relaxed would suffice.
Re: racy acccess in kern_runq.c
Hi, On Fri, Dec 06, 2019 at 03:52:23PM +0900, Kengo NAKAHARA wrote: > There are some racy accesses in kern_runq.c detected by KCSAN. Those > racy access messages is so frequency that they cover other messages, > so I want to fix them. They can be fixed by the following patch. Please don't commit this. These accesses are racy by design. There is no safety issue and we do not want to disturb the activity of other CPUs in this code path by locking them. We also don't want to use atomics either. This code path is extremely hot using atomics would impose a severe performance penalty on the system under certain workloads. Also if you change this to use strong ordering, you're quite likely to change the selection behaviour of LWPs. This is something delicate that exhibits reasonable scheduling behaviour in large part through randomness i.e by accident, so serialising it it's likely to have strong effects on how LWPs are distributed. Lastly I have a number of experimental changes to this code which I'll be committing in the near future allowing people to change the LWP selection policy to see how if we can improve performance under different workloads. They will also likely show up in KCSAN as racy/dangerous accesses. My suggestion is to find a way to teach KCSAN that we know something is racy, we like it being racy, and that it's not a problem, so that it no longer shows up in the KCSAN output. Thank you, Andrew
Re: racy acccess in kern_runq.c
Hi, Thank you for your comment. On 2019/12/06 16:42, Maxime Villard wrote: Le 06/12/2019 à 07:52, Kengo NAKAHARA a écrit : Hi, There are some racy accesses in kern_runq.c detected by KCSAN. Those racy access messages is so frequency that they cover other messages, so I want to fix them. They can be fixed by the following patch. diff --git a/sys/kern/kern_runq.c b/sys/kern/kern_runq.c index 04ff1732016..bb0815689cd 100644 --- a/sys/kern/kern_runq.c +++ b/sys/kern/kern_runq.c @@ -627,7 +627,7 @@ sched_balance(void *nocallout) /* Make lockless countings */ for (CPU_INFO_FOREACH(cii, ci)) { spc = >ci_schedstate; - + spc_lock(ci); /* * Average count of the threads * @@ -643,10 +643,11 @@ sched_balance(void *nocallout) hci = ci; highest = spc->spc_avgcount; } + spc_unlock(ci); } /* Update the worker */ - worker_ci = hci; + atomic_swap_ptr(_ci, hci); if (nocallout == NULL) callout_schedule(_ch, balance_period); @@ -734,12 +735,15 @@ sched_idle(void) spc_unlock(ci); no_migration: + spc_lock(ci); if ((spc->spc_flags & SPCF_OFFLINE) != 0 || spc->spc_count != 0) { + spc_unlock(ci); return; } /* Reset the counter, and call the balancer */ spc->spc_avgcount = 0; + spc_unlock(ci); sched_balance(ci); tci = worker_ci; tspc = >ci_schedstate; It just so happens we were talking about this yesterday. Oh, I missed the thread, sorry. worker_ci indeed needs to be a real atomic, but it means we also have to fetch it atomically here: 744 tci = worker_ci; For the other variables, my understanding was that we don't care about races, but only care about making sure the accesses are not split, so it seems to me atomic_relaxed would suffice. I see. I update the patch. diff --git a/sys/kern/kern_runq.c b/sys/kern/kern_runq.c index 04ff1732016..dbd0f73585a 100644 --- a/sys/kern/kern_runq.c +++ b/sys/kern/kern_runq.c @@ -627,6 +627,7 @@ sched_balance(void *nocallout) /* Make lockless countings */ for (CPU_INFO_FOREACH(cii, ci)) { spc = >ci_schedstate; + u_int avg, mcount; /* * Average count of the threads @@ -634,19 +635,20 @@ sched_balance(void *nocallout) * The average is computed as a fixpoint number with * 8 fractional bits. */ - spc->spc_avgcount = ( - weight * spc->spc_avgcount + (100 - weight) * 256 * spc->spc_mcount - ) / 100; + avg = atomic_load_relaxed(>spc_avgcount); + mcount = atomic_load_relaxed(>spc_mcount); + avg = (weight * avg + (100 - weight) * 256 * mcount) / 100; + atomic_store_relaxed(>spc_avgcount, avg); /* Look for CPU with the highest average */ - if (spc->spc_avgcount > highest) { + if (avg > highest) { hci = ci; - highest = spc->spc_avgcount; + highest = avg; } } /* Update the worker */ - worker_ci = hci; + atomic_swap_ptr(_ci, hci); if (nocallout == NULL) callout_schedule(_ch, balance_period); @@ -734,14 +736,15 @@ sched_idle(void) spc_unlock(ci); no_migration: - if ((spc->spc_flags & SPCF_OFFLINE) != 0 || spc->spc_count != 0) { + if ((spc->spc_flags & SPCF_OFFLINE) != 0 + || atomic_load_relaxed(>spc_count) != 0) { return; } /* Reset the counter, and call the balancer */ - spc->spc_avgcount = 0; + atomic_store_relaxed(>spc_avgcount, 0); sched_balance(ci); - tci = worker_ci; + atomic_swap_ptr(, worker_ci); tspc = >ci_schedstate; if (ci == tci || spc->spc_psid != tspc->spc_psid) return; Is this appropriate? Thanks, -- // Internet Initiative Japan Inc. Device Engineering Section, Product Development Department, Product Division, Technology Unit Kengo NAKAHARA