Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Jason Thorpe


> On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon  wrote:
> 
> Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> it still used only by veriexec?  We can easily option that out of the build
> box kernels.

Indeed.  And there are better ways to do what veriexec does, in any case.

-- thorpej



Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Brett Lymn
 
 I would be interested to finish that off, I need to make some time to
get to doing it though.
I have been sitting on some changes to veriexec for ~ years that
change it from locking everything to using reference counts and
condition variables which removes some nasty hacks I did.  I have not
committed the changes because the kernel would sometimes deadlock and
I was having trouble tracking down why.  Perhaps I was looking in the
wrong spot for the error and it was fileassoc all along that was
causing the deadlock.

- Original Message -
From: "Mindaugas Rasiukevicius" 
To:"Jason Thorpe" 
Cc:
Sent:Fri, 10 Aug 2018 00:12:23 +0100
Subject:Re: 8.0 performance issue when running build.sh?

 Jason Thorpe  wrote:
 > 
 > 
 > > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon  wrote:
 > > 
 > > Actually, I wonder if we could kill off the time spent by
fileassoc. Is
 > > it still used only by veriexec? We can easily option that out of
the
 > > build box kernels.
 > 
 > Indeed. And there are better ways to do what veriexec does, in any
case.
 > 

 Many years ago I wrote a diff to make fileassoc MP-safe:

 http://www.netbsd.org/~rmind/fileassoc.diff

 If somebody wants to finish -- I am glad to help.

 -- 
 Mindaugas




Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Michael van Elst
t...@panix.com (Thor Lancelot Simon) writes:

>>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20

>Actually, I wonder if we could kill off the time spent by fileassoc.

Would be a marginal improvement only. The real scaling problems are in UVM.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: verbose vflushbuf()

2018-08-09 Thread Michael van Elst
dholland-t...@netbsd.org (David Holland) writes:

>Probably, but I don't think it's supposed to happen and possibly it
>should be a panic:

It can regularly happen under load and the retry is supposed to handle that
condition. Still, it shouldn't occur frequently, so in this case there
is a problem.


-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: panic: biodone2 already

2018-08-09 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> > xbd is not mpsafe, so it shouldn't be even race due to parallell
> > processing on different CPUs. Maybe it would be useful to check if the
> > problem still happens when you assign just single CPU to the DOMU.
> 
> I get the crash with vcpu = 1 for the domU. I also tried to pin a single
> cpu for the test domU, I still get it to crash:

I started tracing the code to see where the problem comes from. So far,
I can tell that in vfs_bio.c, bread() -> bio_doread() will call
VOP_STRATEGY once for the offendinf buf_t, but biodone() is called twice
in interrupt context for the buf_t, leading to the biodone2 already
panic later.

Since you know the xbd code you could save me some time: where do we go
below VOP_SRATEGY? 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: verbose vflushbuf()

2018-08-09 Thread Emmanuel Dreyfus
J. Hannken-Illjes  wrote:

> For me it triggers for mounted block devices only and I suppose the
> vnode lock doesn't help here.

I have not yet fully understood the thing, but I suspect it is related
to snapshots.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Mindaugas Rasiukevicius
Jason Thorpe  wrote:
> 
> 
> > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon  wrote:
> > 
> > Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> > it still used only by veriexec?  We can easily option that out of the
> > build box kernels.
> 
> Indeed.  And there are better ways to do what veriexec does, in any case.
> 

Many years ago I wrote a diff to make fileassoc MP-safe:

http://www.netbsd.org/~rmind/fileassoc.diff

If somebody wants to finish -- I am glad to help.

-- 
Mindaugas


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Joerg Sonnenberger
On Fri, Aug 10, 2018 at 12:29:49AM +0200, Joerg Sonnenberger wrote:
> On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote:
> > 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon :
> > > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
> > >> 100.002054 14.18 kernel_lock
> > >>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
> > >>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
> > >>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
> > > Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> > > it still used only by veriexec?  We can easily option that out of the 
> > > build
> > > box kernels.
> > 
> > Or even better, make it less heavy?
> > 
> > It's not really intuitive that you could improve filesystem
> > performance by removing this obscure component.
> 
> If it is not in use, fileassoc_file_delete will short cut already.

...and of course, the check seems to be just useless. So yes, it should
be possible to make it much less heavy.

Joerg


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Joerg Sonnenberger
On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote:
> 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon :
> > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
> >> 100.002054 14.18 kernel_lock
> >>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
> >>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
> >>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
> > Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> > it still used only by veriexec?  We can easily option that out of the build
> > box kernels.
> 
> Or even better, make it less heavy?
> 
> It's not really intuitive that you could improve filesystem
> performance by removing this obscure component.

If it is not in use, fileassoc_file_delete will short cut already.

Joerg


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Jaromír Doleček
2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon :
> On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
>> 100.002054 14.18 kernel_lock
>>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
>>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
>>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
> Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> it still used only by veriexec?  We can easily option that out of the build
> box kernels.

Or even better, make it less heavy?

It's not really intuitive that you could improve filesystem
performance by removing this obscure component.

Jaromir


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Thor Lancelot Simon
On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
> With the patch applied:
> 
> Elapsed time: 1564.93 seconds.
> 
> -- Kernel lock spin
> 
> Total%  Count   Time/ms  Lock   Caller
> -- --- - -- --
> 100.002054 14.18 kernel_lock
>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
>   5.29 662  0.75 kernel_lockVOP_POLL+93
>   5.29  95  0.75 kernel_lockbiodone2+81
>   0.91  15  0.13 kernel_locksleepq_block+1c5
>   0.60  21  0.08 kernel_lockfrag6_fasttimo+1a
>   0.29   9  0.04 kernel_lockip_slowtimo+1a
>   0.27   2  0.04 kernel_lockVFS_SYNC+65
>   0.07   2  0.01 kernel_lockcallout_softclock+42c
>   0.06   3  0.01 kernel_locknd6_timer_work+49
>   0.05   4  0.01 kernel_lockfrag6_slowtimo+1f
>   0.01   4  0.00 kernel_lockkevent1+698
> 
> so .. no need to worry about kernel_lock for this load any more.

Actually, I wonder if we could kill off the time spent by fileassoc.  Is
it still used only by veriexec?  We can easily option that out of the build
box kernels.

-- 
 Thor Lancelot Simon t...@panix.com
  "Whether or not there's hope for change is not the question.  If you
   want to be a free person, you don't stand up for human rights because
   it will work, but because it is right."  --Andrei Sakharov


Re: verbose vflushbuf()

2018-08-09 Thread J. Hannken-Illjes



> On 9. Aug 2018, at 19:03, David Holland  wrote:
> 
> On Thu, Aug 09, 2018 at 12:44:28PM +, Emmanuel Dreyfus wrote:
>> It seems we have something like a debug message left in 
>> src/sys/kern/vfs_subr.c:vflushbuf()
>> 
>>  if (dirty) {
>>  vprint("vflushbuf: dirty", vp);
>>  goto loop;
>>  }
>> 
>> It has been there for a while (7 years). Is there a reason 
>> why it remains always enabled? I have a machine that hit 
>> the place in a loop, getting stuck for hours printing
>> messages on the console. If it safe to #ifdef DEBUG this
>> printf?
> 
> Probably, but I don't think it's supposed to happen and possibly it
> should be a panic:
> 
> /*
> * Called with the underlying vnode locked, which should prevent new dirty
> * buffers from being queued.
> */

For me it triggers for mounted block devices only and I suppose the
vnode lock doesn't help here.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: verbose vflushbuf()

2018-08-09 Thread David Holland
On Thu, Aug 09, 2018 at 12:44:28PM +, Emmanuel Dreyfus wrote:
 > It seems we have something like a debug message left in 
 > src/sys/kern/vfs_subr.c:vflushbuf()
 > 
 >  if (dirty) {
 >  vprint("vflushbuf: dirty", vp);
 >  goto loop;
 >  }
 > 
 > It has been there for a while (7 years). Is there a reason 
 > why it remains always enabled? I have a machine that hit 
 > the place in a loop, getting stuck for hours printing
 > messages on the console. If it safe to #ifdef DEBUG this
 > printf?

Probably, but I don't think it's supposed to happen and possibly it
should be a panic:

/*
 * Called with the underlying vnode locked, which should prevent new dirty
 * buffers from being queued.
 */


-- 
David A. Holland
dholl...@netbsd.org


verbose vflushbuf()

2018-08-09 Thread Emmanuel Dreyfus
Hello

It seems we have something like a debug message left in 
src/sys/kern/vfs_subr.c:vflushbuf()

if (dirty) {
vprint("vflushbuf: dirty", vp);
goto loop;
}

It has been there for a while (7 years). Is there a reason 
why it remains always enabled? I have a machine that hit 
the place in a loop, getting stuck for hours printing
messages on the console. If it safe to #ifdef DEBUG this
printf?

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: repeated panics in mutex_vector_enter (from unp_thread)

2018-08-09 Thread Edgar Fuß
> Reader / writer lock error: lockdebug_wantlock: locking against myself
Turns out this is an entirely different problem.
The backtrace is 
lockdebug_more<-rw_enter<-fr_check<-pfil_run_hooks<-ip6_output<-nd6_ns_output<-nd6_output<-fr_fastroute<-fr_send_ip<-fr_send_reset<-fr_check<-pfi_run_hooks<-ip6_input<-ip6intr<-softint_dispatch.

I guess this might long be fixed.

So I'll try building and running an -8 kernel.


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Martin Husemann
With the patch applied:

Elapsed time: 1564.93 seconds.

-- Kernel lock spin

Total%  Count   Time/ms  Lock   Caller
-- --- - -- --
100.002054 14.18 kernel_lock
 47.43 846  6.72 kernel_lockfileassoc_file_delete+20
 23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
 16.01 203  2.27 kernel_lockscsipi_adapter_request+63
  5.29 662  0.75 kernel_lockVOP_POLL+93
  5.29  95  0.75 kernel_lockbiodone2+81
  0.91  15  0.13 kernel_locksleepq_block+1c5
  0.60  21  0.08 kernel_lockfrag6_fasttimo+1a
  0.29   9  0.04 kernel_lockip_slowtimo+1a
  0.27   2  0.04 kernel_lockVFS_SYNC+65
  0.07   2  0.01 kernel_lockcallout_softclock+42c
  0.06   3  0.01 kernel_locknd6_timer_work+49
  0.05   4  0.01 kernel_lockfrag6_slowtimo+1f
  0.01   4  0.00 kernel_lockkevent1+698

so .. no need to worry about kernel_lock for this load any more.

Mindaugas, can you please commit your patch and request pullup?

Martin