Re: [Prime] Poor iteration times

Brian J. Beesley Tue, 08 Mar 2005 13:38:14 -0800

On Tuesday 08 March 2005 17:27, John R Pierce wrote:
> > You don't get something for nothing.  In order to use hyperthreading
> > there is an overhead cost to manage the multiple threads on the
> > processor.


The overhead is internal to the processor, it shows up in the fact that the 
Prescott P4 CPUs (with hyperthreading) have greater iteration times than 
Northwood P4 CPUs (without hyperthreading) running at the same clock 
frequency. Even with hyperthreading disabled! ISTR this is because of longer 
instruction pipelines therefore a greater loss of cycles when a branch is 
guessed incorrectly, forcing  prefetched instructions & data to be 
invalidated & causing significant delays whilst instructions & data are 
loaded from memory to refill the pipelines.

> > If you are not using software that is optimized for
> > hyperthreading then it is likely that it will run slower than if HT is
> > disabled.

Eh? It's those cycles wasted by prefetch failures which are potentially 
available to the "hyperthreaded" process. Nothing is lost. Hyperthreaded CPUs 
are NOT dual core...
> >
> > I see the same thing in my parallel data compression code.  On a
> > dual-processor machine with HT turned on (so 2 physical and 2 virtual
> > CPUs), the software runs more slowly.  When I turn it off, I see quite
> > a speed boost.

Sure. Two threads competing for one set of resources. Repeat after me, 
hyperthreading is NOT DUAL CORE technology! If your parallel code is assuming 
that all four CPUs are the same - as opposed to four virtual processors 
running in two physical cores - then it's very likely going to cripple itself 
by making completely wrong assumptions about the symmetry of resource 
availability.
>
> otoh, a test workload on one of our server processes runs WAY faster with
> HT enabled on a dual xeon (so 4 virtual CPUs)...  this load is a java
> message processing task which fires off a lot of threads that talk to a
> oracle databasse that does significant computational work (albeit not
> numerical) in oracle pl/sql stored procedures. 

Yeah, the threads will be desynchronised (most things on a server, as opposed 
to a computing engine, work asynchronously) so there is a real chance of 
resources becoming available due to hyperthreading which would be wasted on a 
plain processor setup.

>  there's also extensive
> disk IO primarily writes as oracle updates its indicies and writes its redo
> logs.  reads are almost 100% cached.

Irrelevant except in so far as the disk accesses will help keep the threads 
from competing for the same resources at the same time.
>
> however, this same workload runs twice as fast on a am64 Opteron server
> (actually, a quad opteron 2.2Ghz ran 4x faster than a dual xeon 2.8 2.2Ghz)
> which doesn't have any hyperthreading.  so, this proves little.

Ah, but the Opteron is inherently MUCH more efficient - much shorter 
pipelines, and it's a 64 bit engine as opposed to 32 bit, so lots of integer 
type operations which take multiple cycles on P4 / Xeon will execute in one 
cycle on Opteron. In fact there will be multiple parallel execution units 
involved, so with an efficient compiler which knows about vector processing - 
or hand tuned assembly code ;)

For most things which do NOT depend on SSE2, even the 32-bit Athlon 
outperforms P4 at the same clock speed by a large margin. Remember that AMD 
were losing sales by describing their clock speeds accurately, so invented 
the "equivalent speed rating" with which we're now stuck :( Even though it's 
more informative than Intel's "model number" scheme.

Regards
Brian Beesley
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Re: [Prime] Poor iteration times

Reply via email to