On 4/25/2012 1:00 PM, Jeff Squyres wrote:
On Apr 25, 2012, at 12:51 PM, Ralph Castain wrote:

Sounds rather bizarre. Do you have lstopo on your machine? Might be useful to 
see the output of that so we can understand what it thinks the topology is like 
as this underpins the binding code.

The -nooversubscribe option is a red herring here - it has nothing to do with 
the problem, nor will it help.

FWIW: if you aren't adding --bind-to-core, then OMPI isn't launching your process on any 
specific core at all - we are simply launching it on the node. It sounds to me like your 
code is incorrectly identifying "sharing" when a process isn't bound to a 
specific core.
+1

Put differently: if you're not binding your processes to processor cores, then 
it's quite likely/possible that multiple processes *are* running on the same 
processor cores, at least intermittently, because the OS is allowed to migrate 
processes to whatever processor cores it wants to.
However, Kyle mentioned previously that he was doing a -bind-to-core option. I would suggest adding -report-bindings to the mpirun command line and see what mpirun really thinks it is binding to if it is at all.

There is one piece of information that seems missing and confusing me. Kyle how is your code determining it is the only process bound to a core or conversely another process is bound to the same core?

--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>



Reply via email to