I like what I see. It could be the start for the means of managing a single socket's queue of processes. You do mention that this won't really scale beyond roughly 8 cores. I would love to see you extend this with a 2D array of weights that can be populated by various means (run-time testing, or ACPI tables, etc) with the relative costs of moving a process from one core to another (or one scheduling queue to another, if queue's are assigned to a set of cores, say a socket full of cores). In this way, we may be able to have cores "shut down" by modifying these weights based on demand/load.
While I agree with your comments that OpenBSD was not originally made for lots of cpu/cores, I don't really wish to entertain huge steps backwards on that front. The rthreads work (amoung other work) has been done with an eye towards more efficiently using MP systems. Again, I like this, looks simpler. Let's see you try and solve more of the MP parts as well. :) -Toby.