Hi,
okay lets reboot, even though Gilles last mail was onto something.
The problem is that i failed starting programs with mpirun when more
than one node was involved. I mentioned that it is likely some
configuration problem with my server, especially authentification(we
have some kerberos nightmare going on here)
We then tried to find out where mpirun got in the wrong corner and we
got sidetracked by the nodefile and i reconfigured everything so that we
can at least factor out any NUMA stuff.
Now i see that i am wrong on this mailing list and it is likely a
problem with pbs/torque as
pbsdsh -v hostname
pbsdsh(): spawned task 0
pbsdsh(): spawned task 1
pbsdsh(): spawned task 2
pbsdsh(): spawn event returned: 0 (3 spawns and 0 obits outstanding)
pbsdsh(): sending obit for task 0
a00551.science.domain
pbsdsh(): spawn event returned: 1 (2 spawns and 1 obits outstanding)
pbsdsh(): sending obit for task 1
a00551.science.domain
pbsdsh(): spawn event returned: 2 (1 spawns and 2 obits outstanding)
pbsdsh(): sending obit for task 2
a00551.science.domain
pbsdsh(): obit event returned: 0 (0 spawns and 3 obits outstanding)
pbsdsh(): task 0 exit status 0
pbsdsh(): obit event returned: 1 (0 spawns and 2 obits outstanding)
pbsdsh(): task 1 exit status 0
pbsdsh(): obit event returned: 2 (0 spawns and 1 obits outstanding)
pbsdsh(): task 2 exit status 0
Best,
Oswin
On 2016-09-08 15:42, r...@open-mpi.org wrote:
I’m pruning this email thread so I can actually read the blasted thing
:-)
Guys: you are off in the wilderness chasing ghosts! Please stop.
When I say that Torque uses an “ordered” file, I am _not_ saying that
all the host entries of the same name have to be listed consecutively.
I am saying that the _position_ of each entry has meaning, and you
cannot just change it.
I have honestly totally lost the root of this discussion in all the
white noise about the PBS_NODEFILE. Can we reboot?
Ralph
On Sep 8, 2016, at 5:26 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
Oswin,
One more thing, can you
pbsdsh -v hostname
before invoking mpirun ?
Hopefully this should print the three hostnames
Then you can
ldd `which pbsdsh`
And see which libtorque.so is linked with it
Cheers,
Gilles
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users