Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-06 Thread Gus Correa
Hi Anthony, Ralph, Gilles, all As far as I know, for core/processor assignment to user jobs to work, Torque needs to be configured with cpuset support (configure --enable-cpuset ...). That is separate from what OpenMPI does in terms of process binding. Otherwise, the user processes in the job

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-06 Thread r...@open-mpi.org
No problem - glad you were able to work it out! > On Oct 5, 2017, at 11:22 PM, Anthony Thyssen > wrote: > > Sorry r...@open-mpi.org as Gilles Gouaillardet > pointed out to me the problem wasn't OpenMPI, but with the specific EPEL >

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-06 Thread Anthony Thyssen
Sorry r...@open-mpi.org as Gilles Gouaillardet pointed out to me the problem wasn't OpenMPI, but with the specific EPEL implementation (see Redhat Bugzilla 1321154) Today, the the server was able to be taken down for maintance, and I wanted to try a few things. After installing EPEL Testing

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-03 Thread r...@open-mpi.org
Can you try a newer version of OMPI, say the 3.0.0 release? Just curious to know if we perhaps “fixed” something relevant. > On Oct 3, 2017, at 5:33 PM, Anthony Thyssen wrote: > > FYI... > > The problem is discussed further in > > Redhat Bugzilla: Bug 1321154 -

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-03 Thread Anthony Thyssen
FYI... The problem is discussed further in Redhat Bugzilla: Bug 1321154 - numa enabled torque don't work https://bugzilla.redhat.com/show_bug.cgi?id=1321154 I'd seen this previous as it required me to add "num_node_boards=1" to each node in the /var/lib/torque/server_priv/nodes to get

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-03 Thread Anthony Thyssen
Thank you Gilles. At least I now have something to follow though with. As a FYI, the torque is the pre-built version from the Redhat Extras (EPEL) archive. torque-4.2.10-10.el7.x86_64 Normally pre-build packages have no problems, but in this case. On Tue, Oct 3, 2017 at 3:39 PM, Gilles

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-02 Thread Anthony Thyssen
The stdin and stdout are saved to separate channels. It is interesting that the output from pbsdsh is node21.emperor 5 times, even though $PBS_NODES is the 5 individual nodes. Attached are the two compressed files, as well as the pbs_hello batch used. Anthony Thyssen ( System Programmer )

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-02 Thread Gilles Gouaillardet
Anthony, in your script, can you set -x env pbsdsh hostname mpirun --display-map --display-allocation --mca ess_base_verbose 10 --mca plm_base_verbose 10 --mca ras_base_verbose 10 hostname and then compress and send the output ? Cheers, Gilles On 10/3/2017 1:19 PM, Anthony Thyssen

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-02 Thread r...@open-mpi.org
One thing I can see is that the local host (where mpirun executed) shows as “node21” in the allocation, while all others show their FQDN. This might be causing some confusion. You might try adding "--mca orte_keep_fqdn_hostnames 1” to your cmd line and see if that helps. > On Oct 2, 2017, at

Re: [OMPI users] OpenMPI with-tm is not obeying torque

2017-10-02 Thread Anthony Thyssen
Update... Problem of all processes runing on first node (oversubscribed dual-core machine) is NOT resolved. Changing the mpirun command in the Torque batch script ("pbs_hello" - See previous) to mpirun --nooversubscribe --display-allocation hostname Then submitting to PBS/Torque using