So after more investigation here is an update: The effect depends highly on "too high" -j args to build.sh. As a stopgap hack I reduced them on the build cluster and now the netbsd-7 builds are slow, but not super slow (and all other builds are slightly slower).
I tested building of netbsd-7 on a netbsd-8 build cluster slave a lot and the ideal -j value is 11 or 12 (only!) for a single build. Builds for other branches apparently do better with higher -j values, but I have not done extensive testing there (yet). With a bit of help (tx everyone involved!) I got lockstat output, and the single *really* hot lock on 8.0 is kernel_lock, while everything else looks quite normal. So for kernel_lock we traced the callers, and got this: On a build cluster slave running 7.1_STABLE, building netbsd-7 with "build.sh -m alpha -j 24" and source, obj, dest, release and tooldirs all on tmpfs, tools already build, we get: Elapsed time: 1475.74 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller ------ ------- --------- ---------------------- ------------------------------ 100.00 159874 366.69 kernel_lock <all> 41.76 65404 153.12 kernel_lock VOP_POLL+43 19.29 36948 70.74 kernel_lock VOP_LOCK+43 13.25 20066 48.59 kernel_lock tcp_send_wrapper+22 7.45 8042 27.30 kernel_lock intr_biglock_wrapper+12 4.94 6220 18.11 kernel_lock sleepq_block+1c7 4.88 5654 17.91 kernel_lock fileassoc_file_delete+1c 3.49 7681 12.80 kernel_lock VOP_UNLOCK+40 2.59 2558 9.50 kernel_lock bdev_strategy+52 1.05 5626 3.85 kernel_lock VOP_WRITE+48 0.38 904 1.38 kernel_lock VOP_READ+48 0.29 254 1.05 kernel_lock biodone2+6f 0.23 103 0.84 kernel_lock frag6_fasttimo+1a 0.15 217 0.57 kernel_lock tcp_input+e27 0.15 102 0.54 kernel_lock tcp_input+1366 0.02 17 0.08 kernel_lock callout_softclock+38d 0.02 6 0.06 kernel_lock ip6flow_slowtimo+e 0.02 13 0.06 kernel_lock ip_slowtimo+1a 0.02 15 0.06 kernel_lock ip6flow_slowtimo_work+1f 0.02 17 0.06 kernel_lock frag6_slowtimo+1b 0.01 16 0.04 kernel_lock ipflow_slowtimo_work+1f 0.01 10 0.02 kernel_lock ipflow_slowtimo+e 0.00 1 0.01 kernel_lock udp_send_wrapper+2b Same setup, on a build cluster slave running 8.0_RC2 we get: Elapsed time: 3857.33 seconds. -- Kernel lock spin Total% Count Time/ms Lock Caller ------ ------- --------- ---------------------- ------------------------------ 100.00 838202502 25353589.88 kernel_lock <all> 86.70 662529722 21982378.01 kernel_lock sleepq_block+1b5 6.27 62286987 1590912.88 kernel_lock VOP_LOCK+b2 4.05 54443880 1027059.31 kernel_lock VOP_LOOKUP+5e 2.38 51325126 603535.23 kernel_lock VOP_UNLOCK+70 0.42 5623585 105948.06 kernel_lock VOP_GETATTR+5e 0.05 741051 13894.97 kernel_lock VOP_POLL+93 0.02 290699 5546.12 kernel_lock tcp_send_wrapper+22 0.02 91076 5167.14 kernel_lock intr_biglock_wrapper+16 0.02 206972 4178.13 kernel_lock VOP_INACTIVE+5d 0.01 133766 3472.19 kernel_lock scsipi_adapter_request+63 0.01 161192 3109.39 kernel_lock fileassoc_file_delete+20 0.01 101838 1833.00 kernel_lock VOP_CREATE+62 0.01 83442 1812.25 kernel_lock VOP_WRITE+61 0.01 64171 1342.68 kernel_lock VOP_READ+61 0.00 28336 846.55 kernel_lock softint_dispatch+dc 0.00 37851 772.09 kernel_lock VOP_REMOVE+61 0.00 10945 606.49 kernel_lock ipintr+37 0.00 7435 353.17 kernel_lock biodone2+6d 0.00 17025 284.61 kernel_lock VOP_ACCESS+5d 0.00 6108 172.84 kernel_lock frag6_fasttimo+1a 0.00 1200 103.23 kernel_lock callout_softclock+2b1 0.00 2789 75.00 kernel_lock nd6_timer_work+49 0.00 2426 70.32 kernel_lock ip_slowtimo+1a 0.00 2384 57.25 kernel_lock frag6_slowtimo+1f 0.00 818 16.57 kernel_lock VOP_ABORTOP+94 0.00 567 14.15 kernel_lock VFS_SYNC+65 0.00 352 10.43 kernel_lock kevent1+692 0.00 428 8.72 kernel_lock VOP_LINK+61 0.00 129 4.86 kernel_lock udp_send_wrapper+2b 0.00 58 1.73 kernel_lock kqueue_register+419 0.00 56 1.46 kernel_lock knote_detach+135 0.00 5 0.27 kernel_lock arpintr+21 0.00 12 0.25 kernel_lock kqueue_register+25b 0.00 8 0.21 kernel_lock cdev_ioctl+8c 0.00 23 0.14 kernel_lock VOP_RMDIR+61 0.00 2 0.08 kernel_lock tcp_attach_wrapper+18 0.00 2 0.06 kernel_lock tcp_detach_wrapper+16 0.00 26 0.02 kernel_lock VOP_MKDIR+65 0.00 10 0.01 kernel_lock VOP_READDIR+6d This is sleepq_block re-aquiring the kernel_lock when the caller did hold it before going to sleep. I see no easy way to get a list of callers that do that and times spend sleeping (we may be on a different cpu when waking up, right?) - so any ideas welcome how to diagnose this further. Martin
