Hi

With nodes=2:ppn=2 Torque will provide two cores on two nodes each
for your job.
Open MPI will honor this, and work only on those nodes and cores.

Torque will put the list of node names (repeated twice each, since
you asked for two ppn/cores) in a "node file" that can be accessed
in your job through the environment variable $PBS_NODEFILE.
That is the default hostfile (or host list) used by Open MPI when you
start your MPI executable with mpirun.
You don't need to add any hostfile or host list to the mpirun
command line.
And in principle there is no reason why you would
want to do that either, as that would be error prone,
at least not for SPMD (single executable) programs.

You can easily print the contents of that file to your job stdout
with:

cat $PBS_NODEFILE

If you add another hostfile or host list to the mpirun command line,
and if that hostfile or host list conflicts with the
contents $PBS_NODEFILE (say, has a different set of nodes),
mpirun will fail.

In my experience, the only situation you would need
to modify this scheme under Torque, is when launching an MPMD (multiple executables) program, to produce an --app appfile, as a modified version of $PBS_NODEFILE. However, that doesn't seem to be the case here, as the mpirun command line in the various emails has a single executable "a.out".

I hope this helps.
Gus Correa

On 07/31/2017 12:43 PM, Elken, Tom wrote:
“4 threads”   In MPI, we refer to this as 4 ranks or 4 processes.

So what is your question?   Are you getting errors with PBS?

-Tom

*From:*users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of *Mahmood Naderan
*Sent:* Monday, July 31, 2017 9:27 AM
*To:* Open MPI Users <users@lists.open-mpi.org>
*Subject:* Re: [OMPI users] -host vs -hostfile

Excuse me, my fault.. I meant

nodes=2:ppn=2

is 4 threads.


Regards,
Mahmood

On Mon, Jul 31, 2017 at 8:49 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:

    ?? Doesn't that tell pbs to allocate 1 node with 2 slots on it? I
    don't see where you get 4

    Sent from my iPad


    On Jul 31, 2017, at 10:00 AM, Mahmood Naderan <mahmood...@gmail.com
    <mailto:mahmood...@gmail.com>> wrote:

        OK. The next question is how touse it with torque (PBS)?
        currently we write this directive

        Nodes=1:ppn=2

        which means 4 threads. Then we omit -np and -hostfile in the
        mpirun command.

        On 31 Jul 2017 20:24, "Elken, Tom" <tom.el...@intel.com
        <mailto:tom.el...@intel.com>> wrote:

            Hi Mahmood,

            With the -hostfile case, Open MPI is trying to helpfully run
            things faster by keeping both processes on one host.  Ways
            to avoid this…

            On the mpirun command line add:

            -pernode  (runs 1 process per node), oe

            -npernode 1 ,   but these two has been deprecated in favor
            of the wonderful syntax:

            --map-by ppr:1:node

            Or you could change your hostfile to:

            |cluster slots=1|

            |compute-0-0 slots=1|

            -Tom

            *From:*users [mailto:users-boun...@lists.open-mpi.org
            <mailto:users-boun...@lists.open-mpi.org>] *On Behalf Of
            *Mahmood Naderan
            *Sent:* Monday, July 31, 2017 6:47 AM
            *To:* Open MPI Users <users@lists.open-mpi.org
            <mailto:users@lists.open-mpi.org>>
            *Subject:* [OMPI users] -host vs -hostfile

            Hi,

            I have stuck at a problem which I don't remember that on
            previous versions. when I run a test program with |-host|,
            it works. I mean, the process spans to the hosts I
            specified. However, when I specify |-hostfile|, it doesn't
            work!!

            |mahmood@cluster:mpitest$
            /share/apps/computer/openmpi-2.0.1/bin/mpirun -host
            compute-0-0,cluster -np 2 a.out|

            
|****************************************************************************|

            |* hwloc 1.11.2 has encountered what looks like an error from
            the operating system.|

            |*|

            |* Package (P#1 cpuset 0xffff0000) intersects with NUMANode
            (P#1 cpuset 0xff00ffff) without inclusion!|

            |* Error occurred in topology.c line 1048|

            |*|

            |* The following FAQ entry in the hwloc documentation may help:|

            |*   What should I do when hwloc reports "operating system"
            warnings?|

            |* Otherwise please report this error message to the hwloc
            user's mailing list,|

            |* along with the output+tarball generated by the
            hwloc-gather-topology script.|

            
|****************************************************************************|

            |Hello world from processor cluster.hpc.org
            <http://cluster.hpc.org>, rank 1 out of 2 processors|

            |Hello world from processor compute-0-0.local, rank 0 out of
            2 processors|

            |mahmood@cluster:mpitest$ cat hosts|

            |cluster|

            |compute-0-0|

            ||

            |mahmood@cluster:mpitest$
            /share/apps/computer/openmpi-2.0.1/bin/mpirun -hostfile
            hosts -np 2 a.out |

            
|****************************************************************************|

            |* hwloc 1.11.2 has encountered what looks like an error from
            the operating system.|

            |*|

            |* Package (P#1 cpuset 0xffff0000) intersects with NUMANode
            (P#1 cpuset 0xff00ffff) without inclusion!|

            |* Error occurred in topology.c line 1048|

            |*|

            |* The following FAQ entry in the hwloc documentation may help:|

            |*   What should I do when hwloc reports "operating system"
            warnings?|

            |* Otherwise please report this error message to the hwloc
            user's mailing list,|

            |* along with the output+tarball generated by the
            hwloc-gather-topology script.|

            
|****************************************************************************|

            |Hello world from processor cluster.hpc.org
            <http://cluster.hpc.org>, rank 0 out of 2 processors|

            |Hello world from processor cluster.hpc.org
            <http://cluster.hpc.org>, rank 1 out of 2 processors|


            how can I resolve that?

            Regards,
            Mahmood


            _______________________________________________
            users mailing list
            users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
            https://rfd.newmexicoconsortium.org/mailman/listinfo/users

        _______________________________________________
        users mailing list
        users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
        https://rfd.newmexicoconsortium.org/mailman/listinfo/users


    _______________________________________________
    users mailing list
    users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
    https://rfd.newmexicoconsortium.org/mailman/listinfo/users



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to