------- Comment #15 from johnfb at mail dot utexas dot edu 2010-07-30 14:00 ------- We have also had some trouble with this issue. We found that in general if we where running on a machine with hardware threads (i.e., Intel's Hyper-Threading) then performance was poor. Most of our runs where on a machine with a Intel Xeon L5530 running RHEL 5. Profiling showed that our program was spending 50% of its time inside libgomp. Setting either GOMP_SPINCOUNT or OMP_WAIT_POLICY as discussed in this thread increased performance greatly. Experiments with disabling and enabling cores with default OMP settings showed that when the Hyper-Thread cores come on performance dipped below what we got when we had only one core enabled on some runs.
A little thought as to how hardware threads are implemented makes it obvious why spinning for more than a few cycles will cause performance problems. If one hardware threads spins then all other threads on that core may be starved as resources are shared between cores. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706