Re: [Prime] Poor iteration times

Brian J. Beesley Wed, 09 Mar 2005 03:29:25 -0800

This is getting distinctly off topic but nevertheless interesting. If anyone 
objects I propose we take it off line.

On Wednesday 09 March 2005 01:34, Jeff Gilchrist wrote:
>
> Yes, Prescotts have longer pipelines and do have greater iteration
> times than Northwood.  What I am saying is that if you take a Prescott
> with HT enabled, it will have greater iteration times as the exact
> same CPU with HT disabled.  The pipeline length is the same in both
> cases, so there is obviously other overhead at work to manage multiple
> threads.
>
> > Eh? It's those cycles wasted by prefetch failures which are potentially
> > available to the "hyperthreaded" process. Nothing is lost. Hyperthreaded
> > CPUs are NOT dual core...
>
> Sorry, I should have been more specific.  I meant to say if your
> application is not multi-threaded then it will not take advantage of
> the HT capabilities.  That is what I meant when I said "optimized for
> HT".
>
> > Sure. Two threads competing for one set of resources. Repeat after me,
> > hyperthreading is NOT DUAL CORE technology! If your parallel code is
> > assuming that all four CPUs are the same - as opposed to four virtual
> > processors running in two physical cores - then it's very likely going to
> > cripple itself by making completely wrong assumptions about the symmetry
> > of resource availability.
>
> Thanks Brian, I know HT != dual core.  I am saying that when I run my
> parallel code with the affinity of the two threads set to the two
> physical processors, it runs much faster when HT is disabled than when
> it is enabled.  There are no other applications running on the machine
> (except regular OS housekeeping).  Again, there is obviously some
> overhead in managing HT if I see such a big difference in performance
> when my software is only running on the physical processors and not
> the virtual ones.
>
> Repeat after me, hyperthreading increases the overhead of processing
> on the same CPU when enabled considering the length of the pipeline is
> the same.

The "other overhead" is going to be the operating system - specifically the 
task scheduler. Don't forget that _no_ process runs uninterrupted, there is 
always something else going on e.g. timer interrupts, server processes 
checking that there is no work for them to do, ...

Irrespective of differences between virtual and real processor cores, the 
task scheduler clearly has a more complicated job when there is more than one 
"execution unit" to keep fed with work. The fact that the task scheduler is a 
process (or at least a thread) in its own right, with a need for (at least) 
CPU and memory resources, complicates matters still further.

I suspect that what is happening when hyperthreading is enabled, and the task 
scheduler isn't tuned properly, is that sometimes the active user thread 
(which is being benchmarked) runs on the "B" virtual processor instead of the 
"A" VP (which is in use by the task scheduler, or an active "background" 
server process) and thus gets access to far less of the physical CPU 
resources than would be the case if it was running on the "A" VP. This would 
explain your observation.

Does the problem persist if your foreground process(es) have their processor 
affinity set to the "A" virtual processor(s)?

Does the problem you report affect only SMP systems? I tried Prime95 with 
affinity set to 0 on a uniprocessor Prescott P4 system running Win XP Home 
and found a very small (~1%; may not be significant) _reduction_ in iteration 
time when hyperthreading was enabled in BIOS - provided that there was 
nothing much else running on the system. However with normal interactive work 
going on as well the iteration time was worse (definite but not hugely) with 
hyperthreading enabled, though the interactive response was definitely 
"snappier" too. I do not have access to a multiprocessor system with HT 
capability so I'm unable to replicate the circumstances of your observation.

What happens with linux is definitely going to be very dependent on the 
kernel. A lot of work has been done incorporating hyperthreading as well as 
"tuning" of the task scheduler (amounting to an almost complete rewrite!) 
even in the "stable" 2.6 series.

As for Windows, only XP and Server 2003 supports hyperthreading at all, and 
even then it seems to be more or less an afterthought. Blame Intel rather 
than Microsoft! But, if hyperthreading sometimes works less than perfectly, 
at least it does _in most circumstances_ allow more total computing power to 
be used than is available in the same system with hyperthreading disabled. If 
the reverse is true given your very specific application, then at least you 
have the option of disabling HT.

BTW I do have a system with a HT capable chipset (Intel E7205) fitted with a 
Northwood CPU. BIOS rightly refuses to enable HT. Anyway the point is that 
the iteration times on this system are what I would expect them to be; it 
doesn't seem to be the chipset that's causing any slowdown you might be 
seeing with HT enabled.

Regards
Brian Beesley
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Re: [Prime] Poor iteration times

Reply via email to