Re: 8.0 performance issue when running build.sh?
> On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon wrote: > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > it still used only by veriexec? We can easily option that out of the build > box kernels. Indeed. And there are better ways to do what veriexec does, in any case. -- thorpej
Re: 8.0 performance issue when running build.sh?
I would be interested to finish that off, I need to make some time to get to doing it though. I have been sitting on some changes to veriexec for ~ years that change it from locking everything to using reference counts and condition variables which removes some nasty hacks I did. I have not committed the changes because the kernel would sometimes deadlock and I was having trouble tracking down why. Perhaps I was looking in the wrong spot for the error and it was fileassoc all along that was causing the deadlock. - Original Message - From: "Mindaugas Rasiukevicius" To:"Jason Thorpe" Cc: Sent:Fri, 10 Aug 2018 00:12:23 +0100 Subject:Re: 8.0 performance issue when running build.sh? Jason Thorpe wrote: > > > > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon wrote: > > > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > it still used only by veriexec? We can easily option that out of the > > build box kernels. > > Indeed. And there are better ways to do what veriexec does, in any case. > Many years ago I wrote a diff to make fileassoc MP-safe: http://www.netbsd.org/~rmind/fileassoc.diff If somebody wants to finish -- I am glad to help. -- Mindaugas
Re: 8.0 performance issue when running build.sh?
t...@panix.com (Thor Lancelot Simon) writes: >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 >Actually, I wonder if we could kill off the time spent by fileassoc. Would be a marginal improvement only. The real scaling problems are in UVM. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: verbose vflushbuf()
dholland-t...@netbsd.org (David Holland) writes: >Probably, but I don't think it's supposed to happen and possibly it >should be a panic: It can regularly happen under load and the retry is supposed to handle that condition. Still, it shouldn't occur frequently, so in this case there is a problem. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: panic: biodone2 already
Emmanuel Dreyfus wrote: > > xbd is not mpsafe, so it shouldn't be even race due to parallell > > processing on different CPUs. Maybe it would be useful to check if the > > problem still happens when you assign just single CPU to the DOMU. > > I get the crash with vcpu = 1 for the domU. I also tried to pin a single > cpu for the test domU, I still get it to crash: I started tracing the code to see where the problem comes from. So far, I can tell that in vfs_bio.c, bread() -> bio_doread() will call VOP_STRATEGY once for the offendinf buf_t, but biodone() is called twice in interrupt context for the buf_t, leading to the biodone2 already panic later. Since you know the xbd code you could save me some time: where do we go below VOP_SRATEGY? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: verbose vflushbuf()
J. Hannken-Illjes wrote: > For me it triggers for mounted block devices only and I suppose the > vnode lock doesn't help here. I have not yet fully understood the thing, but I suspect it is related to snapshots. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: 8.0 performance issue when running build.sh?
Jason Thorpe wrote: > > > > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon wrote: > > > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > it still used only by veriexec? We can easily option that out of the > > build box kernels. > > Indeed. And there are better ways to do what veriexec does, in any case. > Many years ago I wrote a diff to make fileassoc MP-safe: http://www.netbsd.org/~rmind/fileassoc.diff If somebody wants to finish -- I am glad to help. -- Mindaugas
Re: 8.0 performance issue when running build.sh?
On Fri, Aug 10, 2018 at 12:29:49AM +0200, Joerg Sonnenberger wrote: > On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote: > > 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon : > > > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: > > >> 100.002054 14.18 kernel_lock > > >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > > >> 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > > >> 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > > it still used only by veriexec? We can easily option that out of the > > > build > > > box kernels. > > > > Or even better, make it less heavy? > > > > It's not really intuitive that you could improve filesystem > > performance by removing this obscure component. > > If it is not in use, fileassoc_file_delete will short cut already. ...and of course, the check seems to be just useless. So yes, it should be possible to make it much less heavy. Joerg
Re: 8.0 performance issue when running build.sh?
On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote: > 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon : > > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: > >> 100.002054 14.18 kernel_lock > >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > >> 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > >> 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > it still used only by veriexec? We can easily option that out of the build > > box kernels. > > Or even better, make it less heavy? > > It's not really intuitive that you could improve filesystem > performance by removing this obscure component. If it is not in use, fileassoc_file_delete will short cut already. Joerg
Re: 8.0 performance issue when running build.sh?
2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon : > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: >> 100.002054 14.18 kernel_lock >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 >> 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 >> 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > Actually, I wonder if we could kill off the time spent by fileassoc. Is > it still used only by veriexec? We can easily option that out of the build > box kernels. Or even better, make it less heavy? It's not really intuitive that you could improve filesystem performance by removing this obscure component. Jaromir
Re: 8.0 performance issue when running build.sh?
On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: > With the patch applied: > > Elapsed time: 1564.93 seconds. > > -- Kernel lock spin > > Total% Count Time/ms Lock Caller > -- --- - -- -- > 100.002054 14.18 kernel_lock > 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > 5.29 662 0.75 kernel_lockVOP_POLL+93 > 5.29 95 0.75 kernel_lockbiodone2+81 > 0.91 15 0.13 kernel_locksleepq_block+1c5 > 0.60 21 0.08 kernel_lockfrag6_fasttimo+1a > 0.29 9 0.04 kernel_lockip_slowtimo+1a > 0.27 2 0.04 kernel_lockVFS_SYNC+65 > 0.07 2 0.01 kernel_lockcallout_softclock+42c > 0.06 3 0.01 kernel_locknd6_timer_work+49 > 0.05 4 0.01 kernel_lockfrag6_slowtimo+1f > 0.01 4 0.00 kernel_lockkevent1+698 > > so .. no need to worry about kernel_lock for this load any more. Actually, I wonder if we could kill off the time spent by fileassoc. Is it still used only by veriexec? We can easily option that out of the build box kernels. -- Thor Lancelot Simon t...@panix.com "Whether or not there's hope for change is not the question. If you want to be a free person, you don't stand up for human rights because it will work, but because it is right." --Andrei Sakharov
Re: verbose vflushbuf()
> On 9. Aug 2018, at 19:03, David Holland wrote: > > On Thu, Aug 09, 2018 at 12:44:28PM +, Emmanuel Dreyfus wrote: >> It seems we have something like a debug message left in >> src/sys/kern/vfs_subr.c:vflushbuf() >> >> if (dirty) { >> vprint("vflushbuf: dirty", vp); >> goto loop; >> } >> >> It has been there for a while (7 years). Is there a reason >> why it remains always enabled? I have a machine that hit >> the place in a loop, getting stuck for hours printing >> messages on the console. If it safe to #ifdef DEBUG this >> printf? > > Probably, but I don't think it's supposed to happen and possibly it > should be a panic: > > /* > * Called with the underlying vnode locked, which should prevent new dirty > * buffers from being queued. > */ For me it triggers for mounted block devices only and I suppose the vnode lock doesn't help here. -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: verbose vflushbuf()
On Thu, Aug 09, 2018 at 12:44:28PM +, Emmanuel Dreyfus wrote: > It seems we have something like a debug message left in > src/sys/kern/vfs_subr.c:vflushbuf() > > if (dirty) { > vprint("vflushbuf: dirty", vp); > goto loop; > } > > It has been there for a while (7 years). Is there a reason > why it remains always enabled? I have a machine that hit > the place in a loop, getting stuck for hours printing > messages on the console. If it safe to #ifdef DEBUG this > printf? Probably, but I don't think it's supposed to happen and possibly it should be a panic: /* * Called with the underlying vnode locked, which should prevent new dirty * buffers from being queued. */ -- David A. Holland dholl...@netbsd.org
verbose vflushbuf()
Hello It seems we have something like a debug message left in src/sys/kern/vfs_subr.c:vflushbuf() if (dirty) { vprint("vflushbuf: dirty", vp); goto loop; } It has been there for a while (7 years). Is there a reason why it remains always enabled? I have a machine that hit the place in a loop, getting stuck for hours printing messages on the console. If it safe to #ifdef DEBUG this printf? -- Emmanuel Dreyfus m...@netbsd.org
Re: repeated panics in mutex_vector_enter (from unp_thread)
> Reader / writer lock error: lockdebug_wantlock: locking against myself Turns out this is an entirely different problem. The backtrace is lockdebug_more<-rw_enter<-fr_check<-pfil_run_hooks<-ip6_output<-nd6_ns_output<-nd6_output<-fr_fastroute<-fr_send_ip<-fr_send_reset<-fr_check<-pfi_run_hooks<-ip6_input<-ip6intr<-softint_dispatch. I guess this might long be fixed. So I'll try building and running an -8 kernel.
Re: 8.0 performance issue when running build.sh?
With the patch applied: Elapsed time: 1564.93 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller -- --- - -- -- 100.002054 14.18 kernel_lock 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 5.29 662 0.75 kernel_lockVOP_POLL+93 5.29 95 0.75 kernel_lockbiodone2+81 0.91 15 0.13 kernel_locksleepq_block+1c5 0.60 21 0.08 kernel_lockfrag6_fasttimo+1a 0.29 9 0.04 kernel_lockip_slowtimo+1a 0.27 2 0.04 kernel_lockVFS_SYNC+65 0.07 2 0.01 kernel_lockcallout_softclock+42c 0.06 3 0.01 kernel_locknd6_timer_work+49 0.05 4 0.01 kernel_lockfrag6_slowtimo+1f 0.01 4 0.00 kernel_lockkevent1+698 so .. no need to worry about kernel_lock for this load any more. Mindaugas, can you please commit your patch and request pullup? Martin