On Tuesday 08 March 2005 17:27, John R Pierce wrote: > > You don't get something for nothing. In order to use hyperthreading > > there is an overhead cost to manage the multiple threads on the > > processor.
The overhead is internal to the processor, it shows up in the fact that the Prescott P4 CPUs (with hyperthreading) have greater iteration times than Northwood P4 CPUs (without hyperthreading) running at the same clock frequency. Even with hyperthreading disabled! ISTR this is because of longer instruction pipelines therefore a greater loss of cycles when a branch is guessed incorrectly, forcing prefetched instructions & data to be invalidated & causing significant delays whilst instructions & data are loaded from memory to refill the pipelines. > > If you are not using software that is optimized for > > hyperthreading then it is likely that it will run slower than if HT is > > disabled. Eh? It's those cycles wasted by prefetch failures which are potentially available to the "hyperthreaded" process. Nothing is lost. Hyperthreaded CPUs are NOT dual core... > > > > I see the same thing in my parallel data compression code. On a > > dual-processor machine with HT turned on (so 2 physical and 2 virtual > > CPUs), the software runs more slowly. When I turn it off, I see quite > > a speed boost. Sure. Two threads competing for one set of resources. Repeat after me, hyperthreading is NOT DUAL CORE technology! If your parallel code is assuming that all four CPUs are the same - as opposed to four virtual processors running in two physical cores - then it's very likely going to cripple itself by making completely wrong assumptions about the symmetry of resource availability. > > otoh, a test workload on one of our server processes runs WAY faster with > HT enabled on a dual xeon (so 4 virtual CPUs)... this load is a java > message processing task which fires off a lot of threads that talk to a > oracle databasse that does significant computational work (albeit not > numerical) in oracle pl/sql stored procedures. Yeah, the threads will be desynchronised (most things on a server, as opposed to a computing engine, work asynchronously) so there is a real chance of resources becoming available due to hyperthreading which would be wasted on a plain processor setup. > there's also extensive > disk IO primarily writes as oracle updates its indicies and writes its redo > logs. reads are almost 100% cached. Irrelevant except in so far as the disk accesses will help keep the threads from competing for the same resources at the same time. > > however, this same workload runs twice as fast on a am64 Opteron server > (actually, a quad opteron 2.2Ghz ran 4x faster than a dual xeon 2.8 2.2Ghz) > which doesn't have any hyperthreading. so, this proves little. Ah, but the Opteron is inherently MUCH more efficient - much shorter pipelines, and it's a 64 bit engine as opposed to 32 bit, so lots of integer type operations which take multiple cycles on P4 / Xeon will execute in one cycle on Opteron. In fact there will be multiple parallel execution units involved, so with an efficient compiler which knows about vector processing - or hand tuned assembly code ;) For most things which do NOT depend on SSE2, even the 32-bit Athlon outperforms P4 at the same clock speed by a large margin. Remember that AMD were losing sales by describing their clock speeds accurately, so invented the "equivalent speed rating" with which we're now stuck :( Even though it's more informative than Intel's "model number" scheme. Regards Brian Beesley _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
