re: 8.0 performance issue when running build.sh?
> > >> 100.002054 14.18 kernel_lock > > >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > > >> 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > > >> 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 also note that this is 14.18ms of locked time across 26 minutes. this isn't a contention point. .mrg.
Re: 8.0 performance issue when running build.sh?
> On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon wrote: > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > it still used only by veriexec? We can easily option that out of the build > box kernels. Indeed. And there are better ways to do what veriexec does, in any case. -- thorpej
Re: 8.0 performance issue when running build.sh?
I would be interested to finish that off, I need to make some time to get to doing it though. I have been sitting on some changes to veriexec for ~ years that change it from locking everything to using reference counts and condition variables which removes some nasty hacks I did. I have not committed the changes because the kernel would sometimes deadlock and I was having trouble tracking down why. Perhaps I was looking in the wrong spot for the error and it was fileassoc all along that was causing the deadlock. - Original Message - From: "Mindaugas Rasiukevicius" To:"Jason Thorpe" Cc: Sent:Fri, 10 Aug 2018 00:12:23 +0100 Subject:Re: 8.0 performance issue when running build.sh? Jason Thorpe wrote: > > > > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon wrote: > > > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > it still used only by veriexec? We can easily option that out of the > > build box kernels. > > Indeed. And there are better ways to do what veriexec does, in any case. > Many years ago I wrote a diff to make fileassoc MP-safe: http://www.netbsd.org/~rmind/fileassoc.diff If somebody wants to finish -- I am glad to help. -- Mindaugas
Re: 8.0 performance issue when running build.sh?
Jason Thorpe wrote: > > > > On Aug 9, 2018, at 10:40 AM, Thor Lancelot Simon wrote: > > > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > it still used only by veriexec? We can easily option that out of the > > build box kernels. > > Indeed. And there are better ways to do what veriexec does, in any case. > Many years ago I wrote a diff to make fileassoc MP-safe: http://www.netbsd.org/~rmind/fileassoc.diff If somebody wants to finish -- I am glad to help. -- Mindaugas
Re: 8.0 performance issue when running build.sh?
On Fri, Aug 10, 2018 at 12:29:49AM +0200, Joerg Sonnenberger wrote: > On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote: > > 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon : > > > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: > > >> 100.002054 14.18 kernel_lock > > >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > > >> 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > > >> 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > > it still used only by veriexec? We can easily option that out of the > > > build > > > box kernels. > > > > Or even better, make it less heavy? > > > > It's not really intuitive that you could improve filesystem > > performance by removing this obscure component. > > If it is not in use, fileassoc_file_delete will short cut already. ...and of course, the check seems to be just useless. So yes, it should be possible to make it much less heavy. Joerg
Re: 8.0 performance issue when running build.sh?
On Thu, Aug 09, 2018 at 08:14:57PM +0200, Jaromír Doleček wrote: > 2018-08-09 19:40 GMT+02:00 Thor Lancelot Simon : > > On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: > >> 100.002054 14.18 kernel_lock > >> 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > >> 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > >> 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > > Actually, I wonder if we could kill off the time spent by fileassoc. Is > > it still used only by veriexec? We can easily option that out of the build > > box kernels. > > Or even better, make it less heavy? > > It's not really intuitive that you could improve filesystem > performance by removing this obscure component. If it is not in use, fileassoc_file_delete will short cut already. Joerg
Re: 8.0 performance issue when running build.sh?
On Thu, Aug 09, 2018 at 10:10:07AM +0200, Martin Husemann wrote: > With the patch applied: > > Elapsed time: 1564.93 seconds. > > -- Kernel lock spin > > Total% Count Time/ms Lock Caller > -- --- - -- -- > 100.002054 14.18 kernel_lock > 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 > 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 > 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 > 5.29 662 0.75 kernel_lockVOP_POLL+93 > 5.29 95 0.75 kernel_lockbiodone2+81 > 0.91 15 0.13 kernel_locksleepq_block+1c5 > 0.60 21 0.08 kernel_lockfrag6_fasttimo+1a > 0.29 9 0.04 kernel_lockip_slowtimo+1a > 0.27 2 0.04 kernel_lockVFS_SYNC+65 > 0.07 2 0.01 kernel_lockcallout_softclock+42c > 0.06 3 0.01 kernel_locknd6_timer_work+49 > 0.05 4 0.01 kernel_lockfrag6_slowtimo+1f > 0.01 4 0.00 kernel_lockkevent1+698 > > so .. no need to worry about kernel_lock for this load any more. Actually, I wonder if we could kill off the time spent by fileassoc. Is it still used only by veriexec? We can easily option that out of the build box kernels. -- Thor Lancelot Simon t...@panix.com "Whether or not there's hope for change is not the question. If you want to be a free person, you don't stand up for human rights because it will work, but because it is right." --Andrei Sakharov
Re: 8.0 performance issue when running build.sh?
With the patch applied: Elapsed time: 1564.93 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller -- --- - -- -- 100.002054 14.18 kernel_lock 47.43 846 6.72 kernel_lockfileassoc_file_delete+20 23.73 188 3.36 kernel_lockintr_biglock_wrapper+16 16.01 203 2.27 kernel_lockscsipi_adapter_request+63 5.29 662 0.75 kernel_lockVOP_POLL+93 5.29 95 0.75 kernel_lockbiodone2+81 0.91 15 0.13 kernel_locksleepq_block+1c5 0.60 21 0.08 kernel_lockfrag6_fasttimo+1a 0.29 9 0.04 kernel_lockip_slowtimo+1a 0.27 2 0.04 kernel_lockVFS_SYNC+65 0.07 2 0.01 kernel_lockcallout_softclock+42c 0.06 3 0.01 kernel_locknd6_timer_work+49 0.05 4 0.01 kernel_lockfrag6_slowtimo+1f 0.01 4 0.00 kernel_lockkevent1+698 so .. no need to worry about kernel_lock for this load any more. Mindaugas, can you please commit your patch and request pullup? Martin
Re: 8.0 performance issue when running build.sh?
> On Aug 6, 2018, at 12:17 PM, Rhialto wrote: > > 21.96 24626 82447.93 kernel_lockip_slowtimo+1a The work that's happening here looks like a scalability nightmare., regardless of holding the kernel lock or not. A couple of things: 1- It should do a better job of determining if there's any work that it actually has to do. As it is, it's going to lock the kernel_lock and traverse all of the buckets in a hash table even if there aren't any entries in it. Even in the NET_MPSAFE scenario, that's pointless work. 2- The kernel_lock, by its nature, is the lock-of-last-resort, and code needs to be doing everything it can to defer taking that lock until it is deemed absolutely necessary. Even in the not-NET_MPSAFE scenario, there should be a way to determine that IP fragment expiration processing needs to actually occur before taking the kernel_lock. -- thorpej
Re: 8.0 performance issue when running build.sh?
rhia...@falu.nl (Rhialto) writes: >=46rom the caller list, I get the impression that the statistics include >(potentially) all system activity, not just work being done by/on behalf >of the specified program. Otherwise I can't explain the network related >names. All distfiles should have been there already (it was a rebuild of >existing packages but for 8.0). Yes, lockstat data includes all system activity. > 29.41 59522 110425.74 kernel_lockcdev_poll+72 > 21.96 24626 82447.93 kernel_lockip_slowtimo+1a > 20.04 23020 75230.65 kernel_lockfrag6_slowtimo+1f > 10.09 26272 37891.06 kernel_lockfrag6_fasttimo+1a Network isn't MPSAFE yet, so it is naturally that it generates contention on a busy system. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: 8.0 performance issue when running build.sh?
Martin Husemann wrote: > So here is a more detailed analyzis using flamegraphs: > > https://netbsd.org/~martin/bc-perf/ > > > All operations happen on tmpfs, and the locking there is a significant > part of the game - however, lots of mostly idle time is wasted and it is > not clear to me what is going on. Just from a very quick look, it seems like a regression introduced with the vcache changes: the MP-safe flag is set too late and not inherited by the root vnode. http://www.netbsd.org/~rmind/tmpfs_mount_fix.diff -- Mindaugas
Re: 8.0 performance issue when running build.sh?
I happened to be rebuilding some packages on amd64/8.0 while I was reading your article, so I thought I might as well measure something myself. My results are not directly comparable to yours. I build from local rotating disk, not a ram disk. But I have 32 GB of RAM so there should be a lot of caching. From the caller list, I get the impression that the statistics include (potentially) all system activity, not just work being done by/on behalf of the specified program. Otherwise I can't explain the network related names. All distfiles should have been there already (it was a rebuild of existing packages but for 8.0). NetBSD murthe.falu.nl 8.0 NetBSD 8.0 (GENERIC) #0: Tue Jul 17 14:59:51 UTC 2018 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64 pkg_comp:default.conf# lockstat -L kernel_lock pkg_chk -ak ... Elapsed time: 47954.90 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller -- --- - -- -- 100.00 618791 375442.52 kernel_lock 29.41 59522 110425.74 kernel_lockcdev_poll+72 21.96 24626 82447.93 kernel_lockip_slowtimo+1a 20.04 23020 75230.65 kernel_lockfrag6_slowtimo+1f 10.09 26272 37891.06 kernel_lockfrag6_fasttimo+1a 7.50 58970 28150.68 kernel_lockVOP_POLL+93 3.16 11940 11860.16 kernel_locksleepq_block+1b5 2.99 56296 11207.05 kernel_lockVOP_LOCK+b2 1.69 104084 6338.54 kernel_lockbdev_strategy+7f 0.883939 3311.60 kernel_lockfileassoc_file_delete+20 0.725289 2700.97 kernel_lockVOP_IOCTL+65 0.45 703 1680.33 kernel_locknd6_timer_work+49 0.403102 1500.71 kernel_lockVOP_LOOKUP+5e 0.26 17808967.32 kernel_lockVOP_UNLOCK+70 0.146354514.59 kernel_lockVOP_WRITE+61 0.06 115236.77 kernel_locktcp_timer_keep+48 0.047082168.23 kernel_locktcp_rcvd_wrapper+18 0.04 85504144.38 kernel_lockcallout_softclock+2b0 0.034775103.15 kernel_lockkevent1+692 0.023725 79.60 kernel_lockkqueue_register+419 0.01 41908 56.07 kernel_lockintr_biglock_wrapper+16 0.01 56054 50.86 kernel_lockbiodone2+6d 0.01 94 43.05 kernel_lockcdev_read+72 0.01 23 40.65 kernel_lockudp_send_wrapper+2b 0.016199 38.81 kernel_lockVOP_READ+61 0.01 20 32.44 kernel_lockcdev_ioctl+88 0.01 178 30.20 kernel_lockVOP_GETATTR+5e 0.011050 27.36 kernel_lockknote_detach+134 0.01 60 25.10 kernel_lockVFS_SYNC+65 0.01 43 21.64 kernel_lockVOP_READDIR+6d 0.01 635 19.13 kernel_locktcp_send_wrapper+22 0.00 101 16.98 kernel_lockcdev_open+be 0.00 2 13.49 kernel_lockip_drain+e 0.00 23 11.60 kernel_lockudp6_send_wrapper+2b 0.006764 9.57 kernel_locksoftint_dispatch+dc 0.00 9 9.09 kernel_locktcp_timer_rexmt+52 0.00 3 5.78 kernel_locktcp_attach_wrapper+18 0.00 2 5.37 kernel_lockmld_timeo+16 0.00 21 5.10 kernel_locktcp_detach_wrapper+16 0.00 28 5.07 kernel_lockVOP_FCNTL+64 0.00 1 5.03 kernel_locktcp_drain+1d 0.00 31 3.28 kernel_lockbdev_ioctl+88 0.00 484 2.70 kernel_lockdoifioctl+8f3 0.00 107 1.22 kernel_lockVOP_ACCESS+5d 0.00 809 0.91 kernel_lockip6intr+1f 0.00 1 0.65 kernel_lockrip6_send_wrapper+34 0.00 101 0.47 kernel_lockudp6_detach_wrapper+14 0.00 62 0.37 kernel_lockudp_attach_wrapper+16 0.00 55 0.31 kernel_lockudp_detach_wrapper+16 0.00 261 0.18 kernel_lockkqueue_register+25b 0.00 37 0.13 kernel_lockudp6_attach_wrapper+1a 0.00 144 0.13 kernel_lockipintr+37 0.00 216 0.11 kernel_lockVOP_SEEK+9c 0.00 32 0.08 kernel_lockudp6_ctloutput_wrapper+1f 0.00 12 0.03 kernel_lockudp_ctloutput_wrapper+1f 0.00 5 0.02 kernel_locktcp_ctloutput_wrapper+1f 0.00 33 0.02 kernel_lockusb_transfer_complete+122 0.00 6 0.02 kernel_locktcp_peeraddr_wrapper+1b 0.00 10 0.02 kernel_locktcp_disconnect_wrapper+18 0.00 5 0.02 kernel_lockudp6_bind_wrapper+1e 0.00 10 0.01 kernel_lock
Re: 8.0 performance issue when running build.sh?
So here is a more detailed analyzis using flamegraphs: https://netbsd.org/~martin/bc-perf/ All operations happen on tmpfs, and the locking there is a significant part of the game - however, lots of mostly idle time is wasted and it is not clear to me what is going on. Initial tests hint at -current having the same issue. Any ideas? Martin
Re: 8.0 performance issue when running build.sh?
So after more investigation here is an update: The effect depends highly on "too high" -j args to build.sh. As a stopgap hack I reduced them on the build cluster and now the netbsd-7 builds are slow, but not super slow (and all other builds are slightly slower). I tested building of netbsd-7 on a netbsd-8 build cluster slave a lot and the ideal -j value is 11 or 12 (only!) for a single build. Builds for other branches apparently do better with higher -j values, but I have not done extensive testing there (yet). With a bit of help (tx everyone involved!) I got lockstat output, and the single *really* hot lock on 8.0 is kernel_lock, while everything else looks quite normal. So for kernel_lock we traced the callers, and got this: On a build cluster slave running 7.1_STABLE, building netbsd-7 with "build.sh -m alpha -j 24" and source, obj, dest, release and tooldirs all on tmpfs, tools already build, we get: Elapsed time: 1475.74 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller -- --- - -- -- 100.00 159874366.69 kernel_lock 41.76 65404153.12 kernel_lockVOP_POLL+43 19.29 36948 70.74 kernel_lockVOP_LOCK+43 13.25 20066 48.59 kernel_locktcp_send_wrapper+22 7.458042 27.30 kernel_lockintr_biglock_wrapper+12 4.946220 18.11 kernel_locksleepq_block+1c7 4.885654 17.91 kernel_lockfileassoc_file_delete+1c 3.497681 12.80 kernel_lockVOP_UNLOCK+40 2.592558 9.50 kernel_lockbdev_strategy+52 1.055626 3.85 kernel_lockVOP_WRITE+48 0.38 904 1.38 kernel_lockVOP_READ+48 0.29 254 1.05 kernel_lockbiodone2+6f 0.23 103 0.84 kernel_lockfrag6_fasttimo+1a 0.15 217 0.57 kernel_locktcp_input+e27 0.15 102 0.54 kernel_locktcp_input+1366 0.02 17 0.08 kernel_lockcallout_softclock+38d 0.02 6 0.06 kernel_lockip6flow_slowtimo+e 0.02 13 0.06 kernel_lockip_slowtimo+1a 0.02 15 0.06 kernel_lockip6flow_slowtimo_work+1f 0.02 17 0.06 kernel_lockfrag6_slowtimo+1b 0.01 16 0.04 kernel_lockipflow_slowtimo_work+1f 0.01 10 0.02 kernel_lockipflow_slowtimo+e 0.00 1 0.01 kernel_lockudp_send_wrapper+2b Same setup, on a build cluster slave running 8.0_RC2 we get: Elapsed time: 3857.33 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller -- --- - -- -- 100.00 838202502 25353589.88 kernel_lock 86.70 662529722 21982378.01 kernel_locksleepq_block+1b5 6.27 62286987 1590912.88 kernel_lockVOP_LOCK+b2 4.05 54443880 1027059.31 kernel_lockVOP_LOOKUP+5e 2.38 51325126 603535.23 kernel_lockVOP_UNLOCK+70 0.42 5623585 105948.06 kernel_lockVOP_GETATTR+5e 0.05 741051 13894.97 kernel_lockVOP_POLL+93 0.02 290699 5546.12 kernel_locktcp_send_wrapper+22 0.02 91076 5167.14 kernel_lockintr_biglock_wrapper+16 0.02 206972 4178.13 kernel_lockVOP_INACTIVE+5d 0.01 133766 3472.19 kernel_lockscsipi_adapter_request+63 0.01 161192 3109.39 kernel_lockfileassoc_file_delete+20 0.01 101838 1833.00 kernel_lockVOP_CREATE+62 0.01 83442 1812.25 kernel_lockVOP_WRITE+61 0.01 64171 1342.68 kernel_lockVOP_READ+61 0.00 28336846.55 kernel_locksoftint_dispatch+dc 0.00 37851772.09 kernel_lockVOP_REMOVE+61 0.00 10945606.49 kernel_lockipintr+37 0.007435353.17 kernel_lockbiodone2+6d 0.00 17025284.61 kernel_lockVOP_ACCESS+5d 0.006108172.84 kernel_lockfrag6_fasttimo+1a 0.001200103.23 kernel_lockcallout_softclock+2b1 0.002789 75.00 kernel_locknd6_timer_work+49 0.002426 70.32 kernel_lockip_slowtimo+1a 0.002384 57.25 kernel_lockfrag6_slowtimo+1f 0.00 818 16.57 kernel_lockVOP_ABORTOP+94 0.00 567 14.15 kernel_lockVFS_SYNC+65 0.00 352 10.43 kernel_lockkevent1+692 0.00 428 8.72 kernel_lockVOP_LINK+61 0.00 129 4.86 kernel_lockudp_send_wrapper+2b 0.00 58 1.73 kernel_lockkqueue_register+419 0.00 56 1.46 kernel_lockknote_detach+135 0.00 5 0.27 kernel_lockarpintr+21 0.00
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 13, 2018 at 06:50:02AM -0400, Thor Lancelot Simon wrote: > Only the master does much if any disk (actually SSD) I/O. It must be > either the mpt driver or the scsi subsystem. At a _guess_, mpt? I > don't have time to look right now. I couldn't spot any special dma map allocations in mpt. > This is annoying, but seems unlikely to be the cause of the performance > regression; I agree, it is just .. strange. > I don't think the master runs builds any more, does it? So > it shouldn't be doing much and should have tons of CPU and memory > bandwidth available to move that data. It does run builds (but only a single instance). spz compared dmesg output with the old kernel and there was nothing obvious in there either. Martin
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 13, 2018 at 09:37:26AM +0200, Martin Husemann wrote: > > Do you happen to know why > > vmstat -e | fgrep "bus_dma bounces" > > shows a > 500 rate e.g. on b45? I never see a single bounce on any of my > amd64 machines. The build slaves seem to do only a few of them though. Only the master does much if any disk (actually SSD) I/O. It must be either the mpt driver or the scsi subsystem. At a _guess_, mpt? I don't have time to look right now. This is annoying, but seems unlikely to be the cause of the performance regression; I don't think the master runs builds any more, does it? So it shouldn't be doing much and should have tons of CPU and memory bandwidth available to move that data. -- Thor Lancelot Simont...@panix.com "The two most common variations translate as follows: illegitimi non carborundum = the unlawful are not silicon carbide illegitimis non carborundum = the unlawful don't have silicon carbide."
Re: 8.0 performance issue when running build.sh?
On Fri, 13 Jul 2018, Martin Husemann wrote: On Fri, Jul 13, 2018 at 05:55:17AM +0800, Paul Goyette wrote: I have a system with (probably) enough RAM - amd64, 8/16 core/threads, with 128GB, running 8.99.18 - to test if someone wants to provide an explicit test scenario that can run on top of an existing "production" environment. If your production environment is based on -7, it might work. Steps to reproduce (I hope, as I couldn't): - run a netbsd-7 system Ooops - that would be a show-stopper. My one-and-only environment is 8.99.18 and I have no other bootable system (and no painless way to create one). - check out netbsd-7 src and xsrc - run something (with proper -j) : build.sh -m alpha -j N build release iso-image" - save the last ~20 lines of the log somewhere - clean up all build results, tools, obj, (if on tmpfs this is not needed) - reboot to a netbsd-8 kernel - (if on tmpfs, redo the checkout) - run the same build.sh again +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 13, 2018 at 05:55:17AM +0800, Paul Goyette wrote: > I have a system with (probably) enough RAM - amd64, 8/16 core/threads, > with 128GB, running 8.99.18 - to test if someone wants to provide an > explicit test scenario that can run on top of an existing "production" > environment. If your production environment is based on -7, it might work. Steps to reproduce (I hope, as I couldn't): - run a netbsd-7 system - check out netbsd-7 src and xsrc - run something (with proper -j) : build.sh -m alpha -j N build release iso-image" - save the last ~20 lines of the log somewhere - clean up all build results, tools, obj, (if on tmpfs this is not needed) - reboot to a netbsd-8 kernel - (if on tmpfs, redo the checkout) - run the same build.sh again Martin
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 13, 2018 at 04:18:39AM +0700, Robert Elz wrote: > But it is only really serious on the netbsd-7(-*) builds, -6 -8 and HEAD > are not affected (or not nearly as much). They are, but only by ~20%, while the -7 builds are > 400%. My test builds on local hardware show +/-3% (I'd call that noise). And they are not always slower when running -8. Martin
Re: 8.0 performance issue when running build.sh?
Date:Thu, 12 Jul 2018 15:18:08 -0400 From:Thor Lancelot Simon Message-ID: <20180712191808.ga2...@panix.com> | So if 8.0 has a serious tmpfs regression... we don't yet know. If the most serious problem is something in the basic infrastructure then it ought to be affecting all of the builds. But it is only really serious on the netbsd-7(-*) builds, -6 -8 and HEAD are not affected (or not nearly as much). (That is, the -7 builds end up taking about twice as long as HEAD and -8 builds, and much longer than -6 (which is smaller, and uses a much simpler gcc)) I believe the most likely cause is something peculiar to -7, when it runs on a -8 base, itself, perhaps related to the gcc version, or make, of binutilos, or ... and might be something that only affects some of the architectures (I don't have enough info to know). It could be almost anything given that scope limitation, but I don't see how something basic like tmpfs on -8 can be a possible cause. kre
Re: 8.0 performance issue when running build.sh?
On Thu, Jul 12, 2018 at 07:39:10PM +0200, Martin Husemann wrote: > On Thu, Jul 12, 2018 at 12:30:39PM -0400, Thor Lancelot Simon wrote: > > Are you running the builds from tmpfs to tmpfs, like the build cluster > > does? > > No, I don't have enough ram to test it that way. So if 8.0 has a serious tmpfs regression... we don't yet know. -- Thor Lancelot Simont...@panix.com "The two most common variations translate as follows: illegitimi non carborundum = the unlawful are not silicon carbide illegitimis non carborundum = the unlawful don't have silicon carbide."
Re: 8.0 performance issue when running build.sh?
On Thu, Jul 12, 2018 at 12:30:39PM -0400, Thor Lancelot Simon wrote: > Are you running the builds from tmpfs to tmpfs, like the build cluster > does? No, I don't have enough ram to test it that way. Martin
Re: 8.0 performance issue when running build.sh?
On Thu, Jul 12, 2018 at 05:29:57PM +0200, Martin Husemann wrote: > On Tue, Jul 10, 2018 at 11:01:01AM +0200, Martin Husemann wrote: > > So stay tuned, maybe only Intel to blame ;-) > > Nope, that wasn't it. > > I failed to reproduce *any* slowdown on an AMD CPU locally, maya@ kindly > repeated the same test on an affected Intel CPU and also could see no > slowdown. Are you running the builds from tmpfs to tmpfs, like the build cluster does? Thor
Re: 8.0 performance issue when running build.sh?
On Tue, Jul 10, 2018 at 11:01:01AM +0200, Martin Husemann wrote: > So stay tuned, maybe only Intel to blame ;-) Nope, that wasn't it. I failed to reproduce *any* slowdown on an AMD CPU locally, maya@ kindly repeated the same test on an affected Intel CPU and also could see no slowdown. So this seems to not be a generic NetBSD 8.0 issue, but something strange with the concrete hardware (maybe a driver bug). It is never easy - but at least this should not block the remaining 8.0 release cycle. Martin
Re: 8.0 performance issue when running build.sh?
On 11.07.2018 11:47, Takeshi Nakayama wrote: Martin Husemann wrote > >>> Another observation is that grep(1) on one NetBSD server is >>> significantly slower between the switch from -7 to 8RC1. >> >> Please file separate PRs for each (and maybe provide some input files >> to reproduce the issue). > > Already filed: > http://gnats.netbsd.org/53241 > > -- Takeshi Nakayama > Time of "LC_ALL=C grep" query: 0.18 real 0.08 user 0.09 sys Time of "LC_ALL=pl_PL.UTF-8 grep": 15,94 real15,74 user 0,18 sys Good catch! It's 200 times slower! signature.asc Description: OpenPGP digital signature
Re: 8.0 performance issue when running build.sh?
>>> Martin Husemann wrote > > Another observation is that grep(1) on one NetBSD server is > > significantly slower between the switch from -7 to 8RC1. > > Please file separate PRs for each (and maybe provide some input files > to reproduce the issue). Already filed: http://gnats.netbsd.org/53241 -- Takeshi Nakayama
Re: 8.0 performance issue when running build.sh?
On 11.07.2018 09:09, Simon Burge wrote: > Hi folks, > > Martin Husemann wrote: > >> On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote: >>> After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2, >>> the ld(1) linker has serious issues with linking Clang/LLVM single >>> libraries within 20 minutes. This causes frequent timeouts on the NetBSD >>> buildbot in the LLVM buildfarm. Timeouts were never observed in the >>> past, today there might be few of them daily. >> >> Sounds like a binutils issue (or something like too little RAM available >> on the machine). > > Probably only tangently related, but I wasn't able to link an amd64 > GENERIC kernel on a machine with 512MB of RAM, and even using "MKCTF=no > NOCTF=yes" at least got something build but was extraordinarily slow to > do so. I never worried about raising a PR for that as I figured the > world had moved on... > > Cheers, > Simon. > There is certainly contribution to excessive RAM usage.. but I must defer myself profiling tasks and focus on other things now. It's better to focus on lld porting myself (10x boost compared to ld in larger C++ codebase) before profiling. Additionally that LLVM build machine is a remote machine and is serving public service. signature.asc Description: OpenPGP digital signature
Re: 8.0 performance issue when running build.sh?
Hi folks, Martin Husemann wrote: > On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote: > > After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2, > > the ld(1) linker has serious issues with linking Clang/LLVM single > > libraries within 20 minutes. This causes frequent timeouts on the NetBSD > > buildbot in the LLVM buildfarm. Timeouts were never observed in the > > past, today there might be few of them daily. > > Sounds like a binutils issue (or something like too little RAM available > on the machine). Probably only tangently related, but I wasn't able to link an amd64 GENERIC kernel on a machine with 512MB of RAM, and even using "MKCTF=no NOCTF=yes" at least got something build but was extraordinarily slow to do so. I never worried about raising a PR for that as I figured the world had moved on... Cheers, Simon.
Re: 8.0 performance issue when running build.sh?
On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote: > After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2, > the ld(1) linker has serious issues with linking Clang/LLVM single > libraries within 20 minutes. This causes frequent timeouts on the NetBSD > buildbot in the LLVM buildfarm. Timeouts were never observed in the > past, today there might be few of them daily. Sounds like a binutils issue (or something like too little RAM available on the machine). > Another observation is that grep(1) on one NetBSD server is > significantly slower between the switch from -7 to 8RC1. Please file separate PRs for each (and maybe provide some input files to reproduce the issue). Martin
Re: 8.0 performance issue when running build.sh?
On 10.07.2018 11:01, Martin Husemann wrote: > On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote: >> I have no scientific data yet, but just noticed that build times on the >> auto build cluster did rise very dramatically since it has been updated >> to run NetBSD 8.0 RC2. >> >> Since builds move around build slaves sometimes (not exactly randomly, >> but anyway) I picked the alpha port as an example (the first few >> architectures in the alphabetical list get build slaves assigned pretty >> consistently). > > Here is an intermediate result from further experiments and statistics: > > - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current >and not what will be in the final 8.0 release) has a measurable performance >impact - but it is not the big issue here. > > - if we ignore netbsd-7* branches, the performance loss is reasonable >explainable by the SVS penalty - we are going to check that theory soon. > > - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad >interaction with SVS, making those build times sky rocket - if turning >off SVS does not solve this, we will need to dig deeper. > > So stay tuned, maybe only Intel to blame ;-) > > If anyone has concrete pointers for the last issue (or ideas what to change/ > measure) please speak up. > > Martin > After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2, the ld(1) linker has serious issues with linking Clang/LLVM single libraries within 20 minutes. This causes frequent timeouts on the NetBSD buildbot in the LLVM buildfarm. Timeouts were never observed in the past, today there might be few of them daily. We were experimenting with disabled SVS, but it didn't help. Another observation is that grep(1) on one NetBSD server is significantly slower between the switch from -7 to 8RC1. signature.asc Description: OpenPGP digital signature
Re: 8.0 performance issue when running build.sh?
Le 10/07/2018 à 11:01, Martin Husemann a écrit : On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote: I have no scientific data yet, but just noticed that build times on the auto build cluster did rise very dramatically since it has been updated to run NetBSD 8.0 RC2. Since builds move around build slaves sometimes (not exactly randomly, but anyway) I picked the alpha port as an example (the first few architectures in the alphabetical list get build slaves assigned pretty consistently). Here is an intermediate result from further experiments and statistics: - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current and not what will be in the final 8.0 release) has a measurable performance impact - but it is not the big issue here. For the record: EagerFPU has a fixed performance cost, which is, saving+restoring the FPU context during each context switch. LazyFPU, however, had a variable performance cost: during context switches the FPU state was kept on the CPU, in the hope that if we switched back to the owner lwp we may not have had to do a save+restore. If we did have to, however, we needed to send an IPI, and cost(IPI+save+restore) > cost(save+restore). So LazyFPU may have been less/more expensive than EagerFPU, depending on the workload/scheduling. The reason it is more expensive for you is maybe because on your machine each lwp ("make" thread) stays on the same CPU, and the kpreemtions cause a save+restore that is not actually necessary since each CPU always comes back to the owner lwp. (As you said, also, you have the old version of EagerFPU in RC2, which is more expensive than that of the current -current, so it's part of the problem too.) I've already said it, but XSAVEOPT actually eliminates this problem, since it performs the equivalent of LazyFPU (not saving+restoring when not needed) without requiring an IPI. - if we ignore netbsd-7* branches, the performance loss is reasonable explainable by the SVS penalty - we are going to check that theory soon. - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad interaction with SVS, making those build times sky rocket - if turning off SVS does not solve this, we will need to dig deeper. So stay tuned, maybe only Intel to blame ;-) If anyone has concrete pointers for the last issue (or ideas what to change/ measure) please speak up. Not sure this is related, but it seems that the performance of build.sh on netbsd-current is oscillating as fuck. Maybe I just hadn't noticed that it was oscillating this much before, but right now a build.sh -j 4 kernel=GENERIC is in [5min; 5min35s]. Even with SpectreV2/SpectreV4/SVS/EagerFPU all disabled. It seems to me that at a time doing two or three builds was enough and you would get an oscillation that was <5s. Now it looks like I have to do more than 5 builds to get a relevant average. Maxime
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote: > I have no scientific data yet, but just noticed that build times on the > auto build cluster did rise very dramatically since it has been updated > to run NetBSD 8.0 RC2. > > Since builds move around build slaves sometimes (not exactly randomly, > but anyway) I picked the alpha port as an example (the first few > architectures in the alphabetical list get build slaves assigned pretty > consistently). Here is an intermediate result from further experiments and statistics: - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current and not what will be in the final 8.0 release) has a measurable performance impact - but it is not the big issue here. - if we ignore netbsd-7* branches, the performance loss is reasonable explainable by the SVS penalty - we are going to check that theory soon. - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad interaction with SVS, making those build times sky rocket - if turning off SVS does not solve this, we will need to dig deeper. So stay tuned, maybe only Intel to blame ;-) If anyone has concrete pointers for the last issue (or ideas what to change/ measure) please speak up. Martin
Re: 8.0 performance issue when running build.sh?
Le 06/07/2018 à 16:48, Martin Husemann a écrit : On Fri, Jul 06, 2018 at 04:40:48PM +0200, Maxime Villard wrote: This are all successfull builds of HEAD for alpha that happened after June 1: What does that mean, are you building something *on* an Alpha CPU, or are you building the Alpha port on another CPU? It is all about the releng auto build cluster, which is running amd64. Then it is likely caused by two things: * in EagerFPU I made a mistake initially, and it caused the FPU state to be restored when the kernel was leaving a softint. I sent a pullup already for netbsd-8, but it hasn't yet been applied. The fix removes a save+restore, which improves performance. * the XSTATE_BV bitmap is not zeroed at execve time in NetBSD-8. It probably causes some performance loss, because XRSTOR always restores each FPU state instead of just the ones we used. In recent CPUs there are many FPU states and we use only a few in the base system, so the extra restoring costs us. Even more so with EagerFPU, I guess, because we do save+restore unconditionally, rather than on-demand. I fixed that in October 2017 in -current, but it didn't make it to -8. I guess it will have to, now. Beyond that we need to use XSAVEOPT, too. Maxime