On Tuesday 15 February 2005 04:53, John R Pierce wrote: > Jud McCranie wrote: > > At 09:17 PM 2/14/2005, John R Pierce wrote: > >> yes, and the operating systems inability to know how effective > >> hyperthreading can be.
"Effective" depends on the exact resources (registers, cache lines etc) required by the active resources. I'm not aware that any OS has a means for tagging instruction blocks in this way. Even if it did, the overheads involved in tracking the tags would almost certainly exceed the benefits which could be realised. The situation with respect to symmetric multiprocessing (multiple physical CPUs) is similar but rather simpler in so far as only memory resources are contended for. Though effective SMP OSes have been around for a long time, it's only recently that memory contention issues have been looked at seriously, and even then there's dispute. See for instance the linux kernel developers list. > > > > It is on Windows XP home, SP2. Would XP Pro be better? > > same kernel... there's really no accurate way to guess how much the 2nd > virtual CPU is 'worth' when one thread is 100% cpu bound. Yeah, you need Pro as opposed to Home if you have a multiprocessor board with more than one CPU installed, otherwise you're burning electricity in the extra physical CPU(s) without being able to use the extra CPU power available. But, if you have a uniprocessor system with a hyperthread capable CPU installed, buying XP Pro instead of XP Home will upgrade Bill Gates' pension rather than your system ;-) Again, "cpu bound" is a term which has little meaning - a tight loop may use almost no resources in terms of the integer or floating point execution units, and make no demands on memory resources at all, yet all the CPU available cycles could be used up. It's this sort of situation (sloppy programming, really) that hyperthreading exists to exploit. You can reasonably expect tightly optimised code (prime95) to leave not very much in the way of resources for a second "virtual" processor to mop up. However there will almost certainly be _some_ resources not fully utilised. For instance, LL testing thrashes the FPU and the memory subsystem. Something else which uses little memory access and makes no demand on the FPU might be able to run in parallel with little interference. It occurs to me that there might be scope for a version of the trial factoring algorithm optimised to run in parallel with LL testing on a hyperthreaded CPU. This might be much less efficient than the existing TF algorithm (at any rate for possible factors bigger than 2^64) but nevertheless be worthwhile given that HT processors, and OSes which are at least HT aware, are now the norm (for those not addicted to Athlons). There would be an additional benefit in so far as trial factoring is no longer running usefully ahead of LL testing, and (even with the efficient uniprocessor algorithm currently implemented) trial factoring is a poor use of a P4 system. The downside is that any attempt to use _all_ the available resources in a CPU will inevitably result in an increase in power consumption and therefore heat output; this could be an issue with some systems. Note that the current "cap" on P4 execution speeds results largely from thermal dissipation issues rather than fabrication problems. So far as the "trial factoring deficit" is concerned, I think we should now make TF the "best suited" assignment type to ALL processors except P4 and Athlon > 1.2 GHz. We should probably also be looking at running P-1 between TF to one less bit than at present and finishing the TF; and also reducing some of the TF cutoff points (where the depth increases by another bit). The point being that improvements in LL testing efficiency in recent versions - which "rub off" onto P-1 - have not been matched by improvements in TF throughput. No criticism intended as the TF code is actually startlingly efficient already! Brian Beesley _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
