re: 8.0 performance issue when running build.sh?

2018-08-10 Thread matthew green
> > >> 100.002054 14.18 kernel_lock
> > >>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
> > >>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
> > >>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63

also note that this is 14.18ms of locked time across 26 minutes.
this isn't a contention point.


.mrg.


Re: 8.0 performance issue when running build.sh?

2018-08-10 Thread Jason Thorpe


> On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon  wrote:
> 
> Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> it still used only by veriexec?  We can easily option that out of the build
> box kernels.

Indeed.  And there are better ways to do what veriexec does, in any case.

-- thorpej



Re: 8.0 performance issue when running build.sh?

2018-08-10 Thread Brett Lymn
 
 I would be interested to finish that off, I need to make some time to
get to doing it though.
I have been sitting on some changes to veriexec for ~ years that
change it from locking everything to using reference counts and
condition variables which removes some nasty hacks I did.  I have not
committed the changes because the kernel would sometimes deadlock and
I was having trouble tracking down why.  Perhaps I was looking in the
wrong spot for the error and it was fileassoc all along that was
causing the deadlock.

- Original Message -
From: "Mindaugas Rasiukevicius" 
To:"Jason Thorpe" 
Cc:
Sent:Fri, 10 Aug 2018 00:12:23 +0100
Subject:Re: 8.0 performance issue when running build.sh?

 Jason Thorpe  wrote:
 > 
 > 
 > > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon  wrote:
 > > 
 > > Actually, I wonder if we could kill off the time spent by
fileassoc. Is
 > > it still used only by veriexec? We can easily option that out of
the
 > > build box kernels.
 > 
 > Indeed. And there are better ways to do what veriexec does, in any
case.
 > 

 Many years ago I wrote a diff to make fileassoc MP-safe:

 http://www.netbsd.org/~rmind/fileassoc.diff

 If somebody wants to finish -- I am glad to help.

 -- 
 Mindaugas




Re: 8.0 performance issue when running build.sh?

2018-08-10 Thread Michael van Elst
t...@panix.com (Thor Lancelot Simon) writes:

>>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20

>Actually, I wonder if we could kill off the time spent by fileassoc.

Would be a marginal improvement only. The real scaling problems are in UVM.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Mindaugas Rasiukevicius
Jason Thorpe  wrote:
> 
> 
> > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon  wrote:
> > 
> > Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> > it still used only by veriexec?  We can easily option that out of the
> > build box kernels.
> 
> Indeed.  And there are better ways to do what veriexec does, in any case.
> 

Many years ago I wrote a diff to make fileassoc MP-safe:

http://www.netbsd.org/~rmind/fileassoc.diff

If somebody wants to finish -- I am glad to help.

-- 
Mindaugas


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Joerg Sonnenberger
On Fri, Aug 10, 2018 at 12:29:49AM +0200, Joerg Sonnenberger wrote:
> On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote:
> > 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon :
> > > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
> > >> 100.002054 14.18 kernel_lock
> > >>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
> > >>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
> > >>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
> > > Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> > > it still used only by veriexec?  We can easily option that out of the 
> > > build
> > > box kernels.
> > 
> > Or even better, make it less heavy?
> > 
> > It's not really intuitive that you could improve filesystem
> > performance by removing this obscure component.
> 
> If it is not in use, fileassoc_file_delete will short cut already.

...and of course, the check seems to be just useless. So yes, it should
be possible to make it much less heavy.

Joerg


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Joerg Sonnenberger
On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote:
> 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon :
> > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
> >> 100.002054 14.18 kernel_lock
> >>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
> >>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
> >>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
> > Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> > it still used only by veriexec?  We can easily option that out of the build
> > box kernels.
> 
> Or even better, make it less heavy?
> 
> It's not really intuitive that you could improve filesystem
> performance by removing this obscure component.

If it is not in use, fileassoc_file_delete will short cut already.

Joerg


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Jaromír Doleček
2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon :
> On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
>> 100.002054 14.18 kernel_lock
>>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
>>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
>>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
> Actually, I wonder if we could kill off the time spent by fileassoc.  Is
> it still used only by veriexec?  We can easily option that out of the build
> box kernels.

Or even better, make it less heavy?

It's not really intuitive that you could improve filesystem
performance by removing this obscure component.

Jaromir


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Thor Lancelot Simon
On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote:
> With the patch applied:
> 
> Elapsed time: 1564.93 seconds.
> 
> -- Kernel lock spin
> 
> Total%  Count   Time/ms  Lock   Caller
> -- --- - -- --
> 100.002054 14.18 kernel_lock
>  47.43 846  6.72 kernel_lockfileassoc_file_delete+20
>  23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
>  16.01 203  2.27 kernel_lockscsipi_adapter_request+63
>   5.29 662  0.75 kernel_lockVOP_POLL+93
>   5.29  95  0.75 kernel_lockbiodone2+81
>   0.91  15  0.13 kernel_locksleepq_block+1c5
>   0.60  21  0.08 kernel_lockfrag6_fasttimo+1a
>   0.29   9  0.04 kernel_lockip_slowtimo+1a
>   0.27   2  0.04 kernel_lockVFS_SYNC+65
>   0.07   2  0.01 kernel_lockcallout_softclock+42c
>   0.06   3  0.01 kernel_locknd6_timer_work+49
>   0.05   4  0.01 kernel_lockfrag6_slowtimo+1f
>   0.01   4  0.00 kernel_lockkevent1+698
> 
> so .. no need to worry about kernel_lock for this load any more.

Actually, I wonder if we could kill off the time spent by fileassoc.  Is
it still used only by veriexec?  We can easily option that out of the build
box kernels.

-- 
 Thor Lancelot Simon t...@panix.com
  "Whether or not there's hope for change is not the question.  If you
   want to be a free person, you don't stand up for human rights because
   it will work, but because it is right."  --Andrei Sakharov


Re: 8.0 performance issue when running build.sh?

2018-08-09 Thread Martin Husemann
With the patch applied:

Elapsed time: 1564.93 seconds.

-- Kernel lock spin

Total%  Count   Time/ms  Lock   Caller
-- --- - -- --
100.002054 14.18 kernel_lock
 47.43 846  6.72 kernel_lockfileassoc_file_delete+20
 23.73 188  3.36 kernel_lockintr_biglock_wrapper+16
 16.01 203  2.27 kernel_lockscsipi_adapter_request+63
  5.29 662  0.75 kernel_lockVOP_POLL+93
  5.29  95  0.75 kernel_lockbiodone2+81
  0.91  15  0.13 kernel_locksleepq_block+1c5
  0.60  21  0.08 kernel_lockfrag6_fasttimo+1a
  0.29   9  0.04 kernel_lockip_slowtimo+1a
  0.27   2  0.04 kernel_lockVFS_SYNC+65
  0.07   2  0.01 kernel_lockcallout_softclock+42c
  0.06   3  0.01 kernel_locknd6_timer_work+49
  0.05   4  0.01 kernel_lockfrag6_slowtimo+1f
  0.01   4  0.00 kernel_lockkevent1+698

so .. no need to worry about kernel_lock for this load any more.

Mindaugas, can you please commit your patch and request pullup?

Martin


Re: 8.0 performance issue when running build.sh?

2018-08-07 Thread J. Hannken-Illjes


> On 6. Aug 2018, at 23:18, Mindaugas Rasiukevicius  wrote:
> 
> Martin Husemann  wrote:
>> So here is a more detailed analyzis using flamegraphs:
>> 
>>  https://netbsd.org/~martin/bc-perf/
>> 
>> 
>> All operations happen on tmpfs, and the locking there is a significant
>> part of the game - however, lots of mostly idle time is wasted and it is
>> not clear to me what is going on.
> 
> Just from a very quick look, it seems like a regression introduced with
> the vcache changes: the MP-safe flag is set too late and not inherited
> by the root vnode.
> 
> http://www.netbsd.org/~rmind/tmpfs_mount_fix.diff

Very good catch, @martin could you try this diff on an autobuilder?

Looks like it speeds up tmpfs lookups by a factor of ~40 on -8.0.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)



Re: 8.0 performance issue when running build.sh?

2018-08-06 Thread Jason Thorpe


> On Aug 6, 2018, at 12:17 PM, Rhialto  wrote:
> 
> 21.96   24626  82447.93 kernel_lockip_slowtimo+1a


The work that's happening here looks like a scalability nightmare., regardless 
of holding the kernel lock or not.  A couple of things:

1- It should do a better job of determining if there's any work that it 
actually has to do.  As it is, it's going to lock the kernel_lock and traverse 
all of the buckets in a hash table even if there aren't any entries in it.  
Even in the NET_MPSAFE scenario, that's pointless work.

2- The kernel_lock, by its nature, is the lock-of-last-resort, and code needs 
to be doing everything it can to defer taking that lock until it is deemed 
absolutely necessary.  Even in the not-NET_MPSAFE scenario, there should be a 
way to determine that IP fragment expiration processing needs to actually occur 
before taking the kernel_lock.

-- thorpej



Re: 8.0 performance issue when running build.sh?

2018-08-06 Thread Michael van Elst
rhia...@falu.nl (Rhialto) writes:

>=46rom the caller list, I get the impression that the statistics include
>(potentially) all system activity, not just work being done by/on behalf
>of the specified program. Otherwise I can't explain the network related
>names. All distfiles should have been there already (it was a rebuild of
>existing packages but for 8.0).

Yes, lockstat data includes all system activity.

> 29.41   59522 110425.74 kernel_lockcdev_poll+72
> 21.96   24626  82447.93 kernel_lockip_slowtimo+1a
> 20.04   23020  75230.65 kernel_lockfrag6_slowtimo+1f
> 10.09   26272  37891.06 kernel_lockfrag6_fasttimo+1a

Network isn't MPSAFE yet, so it is naturally that it generates contention
on a busy system.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: 8.0 performance issue when running build.sh?

2018-08-06 Thread Mindaugas Rasiukevicius
Martin Husemann  wrote:
> So here is a more detailed analyzis using flamegraphs:
> 
>   https://netbsd.org/~martin/bc-perf/
> 
> 
> All operations happen on tmpfs, and the locking there is a significant
> part of the game - however, lots of mostly idle time is wasted and it is
> not clear to me what is going on.

Just from a very quick look, it seems like a regression introduced with
the vcache changes: the MP-safe flag is set too late and not inherited
by the root vnode.

http://www.netbsd.org/~rmind/tmpfs_mount_fix.diff

-- 
Mindaugas


Re: 8.0 performance issue when running build.sh?

2018-08-06 Thread Rhialto
I happened to be rebuilding some packages on amd64/8.0 while I was
reading your article, so I thought I might as well measure something
myself.

My results are not directly comparable to yours. I build from local
rotating disk, not a ram disk. But I have 32 GB of RAM so there should
be a lot of caching.

From the caller list, I get the impression that the statistics include
(potentially) all system activity, not just work being done by/on behalf
of the specified program. Otherwise I can't explain the network related
names. All distfiles should have been there already (it was a rebuild of
existing packages but for 8.0).

NetBSD murthe.falu.nl 8.0 NetBSD 8.0 (GENERIC) #0: Tue Jul 17 14:59:51 UTC 2018
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64

pkg_comp:default.conf# lockstat -L kernel_lock pkg_chk -ak
...
Elapsed time: 47954.90 seconds.

-- Kernel lock spin

Total%  Count   Time/ms  Lock   Caller
-- --- - -- --
100.00  618791 375442.52 kernel_lock
 29.41   59522 110425.74 kernel_lockcdev_poll+72
 21.96   24626  82447.93 kernel_lockip_slowtimo+1a
 20.04   23020  75230.65 kernel_lockfrag6_slowtimo+1f
 10.09   26272  37891.06 kernel_lockfrag6_fasttimo+1a
  7.50   58970  28150.68 kernel_lockVOP_POLL+93
  3.16   11940  11860.16 kernel_locksleepq_block+1b5
  2.99   56296  11207.05 kernel_lockVOP_LOCK+b2
  1.69  104084   6338.54 kernel_lockbdev_strategy+7f
  0.883939   3311.60 kernel_lockfileassoc_file_delete+20
  0.725289   2700.97 kernel_lockVOP_IOCTL+65
  0.45 703   1680.33 kernel_locknd6_timer_work+49
  0.403102   1500.71 kernel_lockVOP_LOOKUP+5e
  0.26   17808967.32 kernel_lockVOP_UNLOCK+70
  0.146354514.59 kernel_lockVOP_WRITE+61
  0.06 115236.77 kernel_locktcp_timer_keep+48
  0.047082168.23 kernel_locktcp_rcvd_wrapper+18
  0.04   85504144.38 kernel_lockcallout_softclock+2b0
  0.034775103.15 kernel_lockkevent1+692
  0.023725 79.60 kernel_lockkqueue_register+419
  0.01   41908 56.07 kernel_lockintr_biglock_wrapper+16
  0.01   56054 50.86 kernel_lockbiodone2+6d
  0.01  94 43.05 kernel_lockcdev_read+72
  0.01  23 40.65 kernel_lockudp_send_wrapper+2b
  0.016199 38.81 kernel_lockVOP_READ+61
  0.01  20 32.44 kernel_lockcdev_ioctl+88
  0.01 178 30.20 kernel_lockVOP_GETATTR+5e
  0.011050 27.36 kernel_lockknote_detach+134
  0.01  60 25.10 kernel_lockVFS_SYNC+65
  0.01  43 21.64 kernel_lockVOP_READDIR+6d
  0.01 635 19.13 kernel_locktcp_send_wrapper+22
  0.00 101 16.98 kernel_lockcdev_open+be
  0.00   2 13.49 kernel_lockip_drain+e
  0.00  23 11.60 kernel_lockudp6_send_wrapper+2b
  0.006764  9.57 kernel_locksoftint_dispatch+dc
  0.00   9  9.09 kernel_locktcp_timer_rexmt+52
  0.00   3  5.78 kernel_locktcp_attach_wrapper+18
  0.00   2  5.37 kernel_lockmld_timeo+16
  0.00  21  5.10 kernel_locktcp_detach_wrapper+16
  0.00  28  5.07 kernel_lockVOP_FCNTL+64
  0.00   1  5.03 kernel_locktcp_drain+1d
  0.00  31  3.28 kernel_lockbdev_ioctl+88
  0.00 484  2.70 kernel_lockdoifioctl+8f3
  0.00 107  1.22 kernel_lockVOP_ACCESS+5d
  0.00 809  0.91 kernel_lockip6intr+1f
  0.00   1  0.65 kernel_lockrip6_send_wrapper+34
  0.00 101  0.47 kernel_lockudp6_detach_wrapper+14
  0.00  62  0.37 kernel_lockudp_attach_wrapper+16
  0.00  55  0.31 kernel_lockudp_detach_wrapper+16
  0.00 261  0.18 kernel_lockkqueue_register+25b
  0.00  37  0.13 kernel_lockudp6_attach_wrapper+1a
  0.00 144  0.13 kernel_lockipintr+37
  0.00 216  0.11 kernel_lockVOP_SEEK+9c
  0.00  32  0.08 kernel_lockudp6_ctloutput_wrapper+1f
  0.00  12  0.03 kernel_lockudp_ctloutput_wrapper+1f
  0.00   5  0.02 kernel_locktcp_ctloutput_wrapper+1f
  0.00  33  0.02 kernel_lockusb_transfer_complete+122
  0.00   6  0.02 kernel_locktcp_peeraddr_wrapper+1b
  0.00  10  0.02 kernel_locktcp_disconnect_wrapper+18
  0.00   5  0.02 kernel_lockudp6_bind_wrapper+1e
  0.00  10  0.01 kernel_lock  

Re: 8.0 performance issue when running build.sh?

2018-08-06 Thread Martin Husemann
So here is a more detailed analyzis using flamegraphs:

https://netbsd.org/~martin/bc-perf/


All operations happen on tmpfs, and the locking there is a significant
part of the game - however, lots of mostly idle time is wasted and it is
not clear to me what is going on.

Initial tests hint at -current having the same issue.

Any ideas?

Martin


Re: 8.0 performance issue when running build.sh?

2018-08-01 Thread Martin Husemann
So after more investigation here is an update:

The effect depends highly on "too high" -j args to build.sh. As a stopgap
hack I reduced them on the build cluster and now the netbsd-7 builds are
slow, but not super slow (and all other builds are slightly slower).

I tested building of netbsd-7 on a netbsd-8 build cluster slave a lot
and the ideal -j value is 11 or 12 (only!) for a single build. Builds
for other branches apparently do better with higher -j values, but I have
not done extensive testing there (yet).

With a bit of help (tx everyone involved!) I got lockstat output, and
the single *really* hot lock on 8.0 is kernel_lock, while everything
else looks quite normal. So for kernel_lock we traced the callers, and
got this:

On a build cluster slave running 7.1_STABLE, building netbsd-7 with
"build.sh -m alpha -j 24" and source, obj, dest, release and tooldirs
all on tmpfs, tools already build, we get:


Elapsed time: 1475.74 seconds.

-- Kernel lock spin

Total%  Count   Time/ms  Lock   Caller
-- --- - -- --
100.00  159874366.69 kernel_lock
 41.76   65404153.12 kernel_lockVOP_POLL+43
 19.29   36948 70.74 kernel_lockVOP_LOCK+43
 13.25   20066 48.59 kernel_locktcp_send_wrapper+22
  7.458042 27.30 kernel_lockintr_biglock_wrapper+12
  4.946220 18.11 kernel_locksleepq_block+1c7
  4.885654 17.91 kernel_lockfileassoc_file_delete+1c
  3.497681 12.80 kernel_lockVOP_UNLOCK+40
  2.592558  9.50 kernel_lockbdev_strategy+52
  1.055626  3.85 kernel_lockVOP_WRITE+48
  0.38 904  1.38 kernel_lockVOP_READ+48
  0.29 254  1.05 kernel_lockbiodone2+6f
  0.23 103  0.84 kernel_lockfrag6_fasttimo+1a
  0.15 217  0.57 kernel_locktcp_input+e27
  0.15 102  0.54 kernel_locktcp_input+1366
  0.02  17  0.08 kernel_lockcallout_softclock+38d
  0.02   6  0.06 kernel_lockip6flow_slowtimo+e
  0.02  13  0.06 kernel_lockip_slowtimo+1a
  0.02  15  0.06 kernel_lockip6flow_slowtimo_work+1f
  0.02  17  0.06 kernel_lockfrag6_slowtimo+1b
  0.01  16  0.04 kernel_lockipflow_slowtimo_work+1f
  0.01  10  0.02 kernel_lockipflow_slowtimo+e
  0.00   1  0.01 kernel_lockudp_send_wrapper+2b


Same setup, on a build cluster slave running 8.0_RC2 we get:

Elapsed time: 3857.33 seconds.

-- Kernel lock spin

Total%  Count   Time/ms  Lock   Caller
-- --- - -- --
100.00 838202502 25353589.88 kernel_lock
 86.70 662529722 21982378.01 kernel_locksleepq_block+1b5
  6.27 62286987 1590912.88 kernel_lockVOP_LOCK+b2
  4.05 54443880 1027059.31 kernel_lockVOP_LOOKUP+5e
  2.38 51325126 603535.23 kernel_lockVOP_UNLOCK+70
  0.42 5623585 105948.06 kernel_lockVOP_GETATTR+5e
  0.05  741051  13894.97 kernel_lockVOP_POLL+93
  0.02  290699   5546.12 kernel_locktcp_send_wrapper+22
  0.02   91076   5167.14 kernel_lockintr_biglock_wrapper+16
  0.02  206972   4178.13 kernel_lockVOP_INACTIVE+5d
  0.01  133766   3472.19 kernel_lockscsipi_adapter_request+63
  0.01  161192   3109.39 kernel_lockfileassoc_file_delete+20
  0.01  101838   1833.00 kernel_lockVOP_CREATE+62
  0.01   83442   1812.25 kernel_lockVOP_WRITE+61
  0.01   64171   1342.68 kernel_lockVOP_READ+61
  0.00   28336846.55 kernel_locksoftint_dispatch+dc
  0.00   37851772.09 kernel_lockVOP_REMOVE+61
  0.00   10945606.49 kernel_lockipintr+37
  0.007435353.17 kernel_lockbiodone2+6d
  0.00   17025284.61 kernel_lockVOP_ACCESS+5d
  0.006108172.84 kernel_lockfrag6_fasttimo+1a
  0.001200103.23 kernel_lockcallout_softclock+2b1
  0.002789 75.00 kernel_locknd6_timer_work+49
  0.002426 70.32 kernel_lockip_slowtimo+1a
  0.002384 57.25 kernel_lockfrag6_slowtimo+1f
  0.00 818 16.57 kernel_lockVOP_ABORTOP+94
  0.00 567 14.15 kernel_lockVFS_SYNC+65
  0.00 352 10.43 kernel_lockkevent1+692
  0.00 428  8.72 kernel_lockVOP_LINK+61
  0.00 129  4.86 kernel_lockudp_send_wrapper+2b
  0.00  58  1.73 kernel_lockkqueue_register+419
  0.00  56  1.46 kernel_lockknote_detach+135
  0.00   5  0.27 kernel_lockarpintr+21
  0.00

Re: 8.0 performance issue when running build.sh?

2018-07-13 Thread Martin Husemann
On Fri, Jul 13, 2018 at 06:50:02AM -0400, Thor Lancelot Simon wrote:
> Only the master does much if any disk (actually SSD) I/O.  It must be
> either the mpt driver or the scsi subsystem.  At a _guess_, mpt?  I
> don't have time to look right now.

I couldn't spot any special dma map allocations in mpt.

> This is annoying, but seems unlikely to be the cause of the performance
> regression;

I agree, it is just .. strange.

> I don't think the master runs builds any more, does it?  So
> it shouldn't be doing much and should have tons of CPU and memory
> bandwidth available to move that data.

It does run builds (but only a single instance).

spz compared dmesg output with the old kernel and there was nothing obvious
in there either.

Martin


Re: 8.0 performance issue when running build.sh?

2018-07-13 Thread Thor Lancelot Simon
On Fri, Jul 13, 2018 at 09:37:26AM +0200, Martin Husemann wrote:
> 
> Do you happen to know why 
> 
>   vmstat -e | fgrep "bus_dma bounces"
> 
> shows a > 500 rate e.g. on b45? I never see a single bounce on any of my
> amd64 machines. The build slaves seem to do only a few of them though.

Only the master does much if any disk (actually SSD) I/O.  It must be
either the mpt driver or the scsi subsystem.  At a _guess_, mpt?  I
don't have time to look right now.

This is annoying, but seems unlikely to be the cause of the performance
regression; I don't think the master runs builds any more, does it?  So
it shouldn't be doing much and should have tons of CPU and memory
bandwidth available to move that data.

-- 
  Thor Lancelot Simont...@panix.com
 "The two most common variations translate as follows:
illegitimi non carborundum = the unlawful are not silicon carbide
illegitimis non carborundum = the unlawful don't have silicon carbide."


Re: 8.0 performance issue when running build.sh?

2018-07-13 Thread Paul Goyette

On Fri, 13 Jul 2018, Martin Husemann wrote:


On Fri, Jul 13, 2018 at 05:55:17AM +0800, Paul Goyette wrote:

I have a system with (probably) enough RAM - amd64, 8/16 core/threads,
with 128GB, running 8.99.18 - to test if someone wants to provide an
explicit test scenario that can run on top of an existing "production"
environment.


If your production environment is based on -7, it might work.

Steps to reproduce (I hope, as I couldn't):

 - run a netbsd-7 system


Ooops - that would be a show-stopper.  My one-and-only environment is
8.99.18 and I have no other bootable system (and no painless way to
create one).


 - check out netbsd-7 src and xsrc
 - run something (with proper -j) :

build.sh -m alpha -j N build release iso-image"

- save the last ~20 lines of the log somewhere
- clean up all build results, tools, obj,  (if on tmpfs this is not needed)
- reboot to a netbsd-8 kernel
- (if on tmpfs, redo the checkout)
- run the same build.sh again


+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Re: 8.0 performance issue when running build.sh?

2018-07-13 Thread Martin Husemann
On Fri, Jul 13, 2018 at 05:55:17AM +0800, Paul Goyette wrote:
> I have a system with (probably) enough RAM - amd64, 8/16 core/threads,
> with 128GB, running 8.99.18 - to test if someone wants to provide an
> explicit test scenario that can run on top of an existing "production"
> environment.

If your production environment is based on -7, it might work.

Steps to reproduce (I hope, as I couldn't):

  - run a netbsd-7 system
  - check out netbsd-7 src and xsrc
  - run something (with proper -j) :

build.sh -m alpha -j N build release iso-image"

 - save the last ~20 lines of the log somewhere
 - clean up all build results, tools, obj,  (if on tmpfs this is not needed)
 - reboot to a netbsd-8 kernel
 - (if on tmpfs, redo the checkout)
 - run the same build.sh again

Martin


Re: 8.0 performance issue when running build.sh?

2018-07-13 Thread Martin Husemann
On Fri, Jul 13, 2018 at 04:18:39AM +0700, Robert Elz wrote:
> But it is only really serious on the netbsd-7(-*) builds, -6 -8 and HEAD
> are not affected (or not nearly as much).

They are, but only by ~20%, while the -7 builds are > 400%.

My test builds on local hardware show +/-3% (I'd call that noise). And
they are not always slower when running -8.

Martin


Re: 8.0 performance issue when running build.sh?

2018-07-13 Thread Martin Husemann
On Thu, Jul 12, 2018 at 03:18:08PM -0400, Thor Lancelot Simon wrote:
> On Thu, Jul 12, 2018 at 07:39:10PM +0200, Martin Husemann wrote:
> > On Thu, Jul 12, 2018 at 12:30:39PM -0400, Thor Lancelot Simon wrote:
> > > Are you running the builds from tmpfs to tmpfs, like the build cluster
> > > does?
> > 
> > No, I don't have enough ram to test it that way.
> 
> So if 8.0 has a serious tmpfs regression... we don't yet know.

I *did* use tempfs, but not for everything.

Do you happen to know why 

vmstat -e | fgrep "bus_dma bounces"

shows a > 500 rate e.g. on b45? I never see a single bounce on any of my
amd64 machines. The build slaves seem to do only a few of them though.

Martin


Re: 8.0 performance issue when running build.sh?

2018-07-12 Thread Paul Goyette

On Thu, 12 Jul 2018, Thor Lancelot Simon wrote:


On Thu, Jul 12, 2018 at 07:39:10PM +0200, Martin Husemann wrote:

On Thu, Jul 12, 2018 at 12:30:39PM -0400, Thor Lancelot Simon wrote:

Are you running the builds from tmpfs to tmpfs, like the build cluster
does?


No, I don't have enough ram to test it that way.


So if 8.0 has a serious tmpfs regression... we don't yet know.


I have a system with (probably) enough RAM - amd64, 8/16 core/threads,
with 128GB, running 8.99.18 - to test if someone wants to provide an
explicit test scenario that can run on top of an existing "production"
environment.




+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Re: 8.0 performance issue when running build.sh?

2018-07-12 Thread Robert Elz
Date:Thu, 12 Jul 2018 15:18:08 -0400
From:Thor Lancelot Simon 
Message-ID:  <20180712191808.ga2...@panix.com>

  | So if 8.0 has a serious tmpfs regression... we don't yet know.

If the most serious problem is something in the basic infrastructure
then it ought to be affecting all of the builds.

But it is only really serious on the netbsd-7(-*) builds, -6 -8 and HEAD
are not affected (or not nearly as much).   (That is, the -7 builds end
up taking about twice as long as HEAD and -8 builds, and much longer
than -6 (which is smaller, and uses a much simpler gcc))

I believe the most likely cause is something peculiar to -7, when it runs
on a -8 base, itself, perhaps related to the gcc version, or make, of
binutilos, or ... and might be something that only affects some of the
architectures (I don't have enough info to know).   It could be almost
anything given that scope limitation, but I don't see how something
basic like tmpfs on -8 can be a possible cause.

kre



Re: 8.0 performance issue when running build.sh?

2018-07-12 Thread Thor Lancelot Simon
On Thu, Jul 12, 2018 at 07:39:10PM +0200, Martin Husemann wrote:
> On Thu, Jul 12, 2018 at 12:30:39PM -0400, Thor Lancelot Simon wrote:
> > Are you running the builds from tmpfs to tmpfs, like the build cluster
> > does?
> 
> No, I don't have enough ram to test it that way.

So if 8.0 has a serious tmpfs regression... we don't yet know.

-- 
  Thor Lancelot Simont...@panix.com
 "The two most common variations translate as follows:
illegitimi non carborundum = the unlawful are not silicon carbide
illegitimis non carborundum = the unlawful don't have silicon carbide."


Re: 8.0 performance issue when running build.sh?

2018-07-12 Thread Martin Husemann
On Thu, Jul 12, 2018 at 12:30:39PM -0400, Thor Lancelot Simon wrote:
> Are you running the builds from tmpfs to tmpfs, like the build cluster
> does?

No, I don't have enough ram to test it that way.

Martin


Re: 8.0 performance issue when running build.sh?

2018-07-12 Thread Thor Lancelot Simon
On Thu, Jul 12, 2018 at 05:29:57PM +0200, Martin Husemann wrote:
> On Tue, Jul 10, 2018 at 11:01:01AM +0200, Martin Husemann wrote:
> > So stay tuned, maybe only Intel to blame ;-)
> 
> Nope, that wasn't it.
> 
> I failed to reproduce *any* slowdown on an AMD CPU locally, maya@ kindly
> repeated the same test on an affected Intel CPU and also could see no
> slowdown.

Are you running the builds from tmpfs to tmpfs, like the build cluster
does?

Thor


Re: 8.0 performance issue when running build.sh?

2018-07-12 Thread Martin Husemann
On Tue, Jul 10, 2018 at 11:01:01AM +0200, Martin Husemann wrote:
> So stay tuned, maybe only Intel to blame ;-)

Nope, that wasn't it.

I failed to reproduce *any* slowdown on an AMD CPU locally, maya@ kindly
repeated the same test on an affected Intel CPU and also could see no
slowdown.

So this seems to not be a generic NetBSD 8.0 issue, but something strange
with the concrete hardware (maybe a driver bug).

It is never easy - but at least this should not block the remaining 8.0
release cycle.

Martin


Re: 8.0 performance issue when running build.sh?

2018-07-11 Thread Kamil Rytarowski
On 11.07.2018 11:47, Takeshi Nakayama wrote:
 Martin Husemann  wrote
> 
>>> Another observation is that grep(1) on one NetBSD server is
>>> significantly slower between the switch from -7 to 8RC1.
>>
>> Please file separate PRs for each (and maybe provide some input files
>> to reproduce the issue).
> 
> Already filed:
>   http://gnats.netbsd.org/53241
> 
> -- Takeshi Nakayama
> 

Time of "LC_ALL=C grep" query:
0.18 real 0.08 user 0.09 sys

Time of "LC_ALL=pl_PL.UTF-8 grep":
15,94 real15,74 user 0,18 sys


Good catch! It's 200 times slower!



signature.asc
Description: OpenPGP digital signature


Re: 8.0 performance issue when running build.sh?

2018-07-11 Thread Takeshi Nakayama
>>> Martin Husemann  wrote

> > Another observation is that grep(1) on one NetBSD server is
> > significantly slower between the switch from -7 to 8RC1.
> 
> Please file separate PRs for each (and maybe provide some input files
> to reproduce the issue).

Already filed:
  http://gnats.netbsd.org/53241

-- Takeshi Nakayama


Re: 8.0 performance issue when running build.sh?

2018-07-11 Thread Kamil Rytarowski
On 11.07.2018 09:09, Simon Burge wrote:
> Hi folks,
> 
> Martin Husemann wrote:
> 
>> On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote:
>>> After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2,
>>> the ld(1) linker has serious issues with linking Clang/LLVM single
>>> libraries within 20 minutes. This causes frequent timeouts on the NetBSD
>>> buildbot in the LLVM buildfarm. Timeouts were never observed in the
>>> past, today there might be few of them daily.
>>
>> Sounds like a binutils issue (or something like too little RAM available
>> on the machine).
> 
> Probably only tangently related, but I wasn't able to link an amd64
> GENERIC kernel on a machine with 512MB of RAM, and even using "MKCTF=no
> NOCTF=yes" at least got something build but was extraordinarily slow to
> do so.  I never worried about raising a PR for that as I figured the
> world had moved on...
> 
> Cheers,
> Simon.
> 

There is certainly contribution to excessive RAM usage.. but I must
defer myself profiling tasks and focus on other things now. It's better
to focus on lld porting myself (10x boost compared to ld in larger C++
codebase) before profiling.

Additionally that LLVM build machine is a remote machine and is serving
public service.



signature.asc
Description: OpenPGP digital signature


Re: 8.0 performance issue when running build.sh?

2018-07-11 Thread Simon Burge
Hi folks,

Martin Husemann wrote:

> On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote:
> > After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2,
> > the ld(1) linker has serious issues with linking Clang/LLVM single
> > libraries within 20 minutes. This causes frequent timeouts on the NetBSD
> > buildbot in the LLVM buildfarm. Timeouts were never observed in the
> > past, today there might be few of them daily.
>
> Sounds like a binutils issue (or something like too little RAM available
> on the machine).

Probably only tangently related, but I wasn't able to link an amd64
GENERIC kernel on a machine with 512MB of RAM, and even using "MKCTF=no
NOCTF=yes" at least got something build but was extraordinarily slow to
do so.  I never worried about raising a PR for that as I figured the
world had moved on...

Cheers,
Simon.


Re: 8.0 performance issue when running build.sh?

2018-07-10 Thread Martin Husemann
On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote:
> After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2,
> the ld(1) linker has serious issues with linking Clang/LLVM single
> libraries within 20 minutes. This causes frequent timeouts on the NetBSD
> buildbot in the LLVM buildfarm. Timeouts were never observed in the
> past, today there might be few of them daily.

Sounds like a binutils issue (or something like too little RAM available
on the machine).

> Another observation is that grep(1) on one NetBSD server is
> significantly slower between the switch from -7 to 8RC1.

Please file separate PRs for each (and maybe provide some input files
to reproduce the issue).

Martin




Re: 8.0 performance issue when running build.sh?

2018-07-10 Thread Kamil Rytarowski
On 10.07.2018 11:01, Martin Husemann wrote:
> On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote:
>> I have no scientific data yet, but just noticed that build times on the
>> auto build cluster did rise very dramatically since it has been updated
>> to run NetBSD 8.0 RC2.
>>
>> Since builds move around build slaves sometimes (not exactly randomly,
>> but anyway) I picked the alpha port as an example (the first few
>> architectures in the alphabetical list get build slaves assigned pretty
>> consistently).
> 
> Here is an intermediate result from further experiments and statistics:
> 
>  - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current
>and not what will be in the final 8.0 release) has a measurable performance
>impact - but it is not the big issue here.
> 
>  - if we ignore netbsd-7* branches, the performance loss is reasonable 
>explainable by the SVS penalty - we are going to check that theory soon.
> 
>  - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad
>interaction with SVS, making those build times sky rocket - if turning
>off SVS does not solve this, we will need to dig deeper.
> 
> So stay tuned, maybe only Intel to blame ;-)
> 
> If anyone has concrete pointers for the last issue (or ideas what to change/
> measure) please speak up.
> 
> Martin
> 

After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2,
the ld(1) linker has serious issues with linking Clang/LLVM single
libraries within 20 minutes. This causes frequent timeouts on the NetBSD
buildbot in the LLVM buildfarm. Timeouts were never observed in the
past, today there might be few of them daily.

We were experimenting with disabled SVS, but it didn't help.

Another observation is that grep(1) on one NetBSD server is
significantly slower between the switch from -7 to 8RC1.



signature.asc
Description: OpenPGP digital signature


Re: 8.0 performance issue when running build.sh?

2018-07-10 Thread Maxime Villard

Le 10/07/2018 à 11:01, Martin Husemann a écrit :

On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote:

I have no scientific data yet, but just noticed that build times on the
auto build cluster did rise very dramatically since it has been updated
to run NetBSD 8.0 RC2.

Since builds move around build slaves sometimes (not exactly randomly,
but anyway) I picked the alpha port as an example (the first few
architectures in the alphabetical list get build slaves assigned pretty
consistently).


Here is an intermediate result from further experiments and statistics:

  - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current
and not what will be in the final 8.0 release) has a measurable performance
impact - but it is not the big issue here.


For the record: EagerFPU has a fixed performance cost, which is,
saving+restoring the FPU context during each context switch. LazyFPU, however,
had a variable performance cost: during context switches the FPU state was
kept on the CPU, in the hope that if we switched back to the owner lwp we may
not have had to do a save+restore. If we did have to, however, we needed to
send an IPI, and cost(IPI+save+restore) > cost(save+restore).

So LazyFPU may have been less/more expensive than EagerFPU, depending on the
workload/scheduling.

The reason it is more expensive for you is maybe because on your machine each
lwp ("make" thread) stays on the same CPU, and the kpreemtions cause a
save+restore that is not actually necessary since each CPU always comes back
to the owner lwp. (As you said, also, you have the old version of EagerFPU in
RC2, which is more expensive than that of the current -current, so it's part
of the problem too.)

I've already said it, but XSAVEOPT actually eliminates this problem, since it
performs the equivalent of LazyFPU (not saving+restoring when not needed)
without requiring an IPI.


  - if we ignore netbsd-7* branches, the performance loss is reasonable
explainable by the SVS penalty - we are going to check that theory soon.

  - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad
interaction with SVS, making those build times sky rocket - if turning
off SVS does not solve this, we will need to dig deeper.

So stay tuned, maybe only Intel to blame ;-)

If anyone has concrete pointers for the last issue (or ideas what to change/
measure) please speak up.


Not sure this is related, but it seems that the performance of build.sh on
netbsd-current is oscillating as fuck. Maybe I just hadn't noticed that it
was oscillating this much before, but right now a

build.sh -j 4 kernel=GENERIC

is in [5min; 5min35s]. Even with SpectreV2/SpectreV4/SVS/EagerFPU all
disabled. It seems to me that at a time doing two or three builds was enough
and you would get an oscillation that was <5s. Now it looks like I have to
do more than 5 builds to get a relevant average.

Maxime



Re: 8.0 performance issue when running build.sh?

2018-07-10 Thread Martin Husemann
On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote:
> I have no scientific data yet, but just noticed that build times on the
> auto build cluster did rise very dramatically since it has been updated
> to run NetBSD 8.0 RC2.
> 
> Since builds move around build slaves sometimes (not exactly randomly,
> but anyway) I picked the alpha port as an example (the first few
> architectures in the alphabetical list get build slaves assigned pretty
> consistently).

Here is an intermediate result from further experiments and statistics:

 - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current
   and not what will be in the final 8.0 release) has a measurable performance
   impact - but it is not the big issue here.

 - if we ignore netbsd-7* branches, the performance loss is reasonable 
   explainable by the SVS penalty - we are going to check that theory soon.

 - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad
   interaction with SVS, making those build times sky rocket - if turning
   off SVS does not solve this, we will need to dig deeper.

So stay tuned, maybe only Intel to blame ;-)

If anyone has concrete pointers for the last issue (or ideas what to change/
measure) please speak up.

Martin


Re: 8.0 performance issue when running build.sh?

2018-07-06 Thread Maxime Villard

Le 06/07/2018 à 16:48, Martin Husemann a écrit :

On Fri, Jul 06, 2018 at 04:40:48PM +0200, Maxime Villard wrote:

This are all successfull builds of HEAD for alpha that happened after June 1:


What does that mean, are you building something *on* an Alpha CPU, or are
you building the Alpha port on another CPU?


It is all about the releng auto build cluster, which is running amd64.


Then it is likely caused by two things:

 * in EagerFPU I made a mistake initially, and it caused the FPU state to
   be restored when the kernel was leaving a softint. I sent a pullup
   already for netbsd-8, but it hasn't yet been applied. The fix removes a
   save+restore, which improves performance.

 * the XSTATE_BV bitmap is not zeroed at execve time in NetBSD-8. It
   probably causes some performance loss, because XRSTOR always restores
   each FPU state instead of just the ones we used. In recent CPUs there
   are many FPU states and we use only a few in the base system, so the
   extra restoring costs us. Even more so with EagerFPU, I guess, because
   we do save+restore unconditionally, rather than on-demand. I fixed that
   in October 2017 in -current, but it didn't make it to -8. I guess it
   will have to, now.

Beyond that we need to use XSAVEOPT, too.

Maxime