On 4/25/2012 1:00 PM, Jeff Squyres wrote:
On Apr 25, 2012, at 12:51 PM, Ralph Castain wrote:
Sounds rather bizarre. Do you have lstopo on your machine? Might be useful to
see the output of that so we can understand what it thinks the topology is like
as this underpins the binding code.
The -nooversubscribe option is a red herring here - it has nothing to do with
the problem, nor will it help.
FWIW: if you aren't adding --bind-to-core, then OMPI isn't launching your process on any
specific core at all - we are simply launching it on the node. It sounds to me like your
code is incorrectly identifying "sharing" when a process isn't bound to a
specific core.
+1
Put differently: if you're not binding your processes to processor cores, then
it's quite likely/possible that multiple processes *are* running on the same
processor cores, at least intermittently, because the OS is allowed to migrate
processes to whatever processor cores it wants to.
However, Kyle mentioned previously that he was doing a -bind-to-core
option. I would suggest adding -report-bindings to the mpirun command
line and see what mpirun really thinks it is binding to if it is at all.
There is one piece of information that seems missing and confusing me.
Kyle how is your code determining it is the only process bound to a core
or conversely another process is bound to the same core?
--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>