Great work! Nonetheless I feel that the last few changes nerf a quad-core machine way too much; you are killing 50% of what you gained in the -j48 case for buildkernel and even worse than in the original case with -j4, which is the most common case. buildworld -j8 on test29 also loses 50% of the original improvement with commit 2 or 3.
I don't think this is a good trade-off at all; are we really optimizing for 4-socket 48-core machines and letting the way more common 4-8 core machines out? Simply adding lwkt_yield()s all over the place doesn't really sound like a great strategy in the first place. It sounds more like a stopgap or debug solution for a 48-core machine than something that should be committed (straight ahead). Cheers, Alex Hornung On 29/10/11 00:28, Matthew Dillon wrote: > 89.61 real 196.30 user 59.04 sys test29 -j4 (patch) > 86.55 real 195.14 user 49.52 sys test29 -j4 (commit) > 93.77 real 195.94 user 67.68 sys test29 -j4 (commit 3) > > 167.62 real 360.44 user 4148.45 sys monster -j48 (prepatch) > 110.26 real 362.93 user 1281.41 sys monster -j48 (patch) > 101.68 real 380.67 user 1864.92 sys monster -j48 (commit 1) > 59.66 real 349.45 user 208.59 sys monster -j48 (commit > 3)<<< > > 96.37 real 209.52 user 63.77 sys test29 -j48 (patch) > 85.72 real 196.93 user 52.08 sys test29 -j48 (commit 1) > 90.01 real 196.91 user 70.32 sys test29 -j48 (commit 3) > > Kernel build results are as expected for the most part. -j 48 build > times on the many-cores monster are GREATLY improved, from 101 seconds > to 59.66 seconds (and down from 167 seconds before this work began). > > That's a +181% improvement, almost 3x faster. > > The -j 4 build and the quad-core test29 build were not expected to show > any improvement since there isn't really any spinlock contention with > only 4 cores. There was a slight nerf on test28 (the quad-core box) but > that might be related to some of the lwkt_yield()s added and not so > much the PQ_INACTIVE/PQ_ACTIVE vm_page_queues[] changes.
