Hi Govind

Govind Songara wrote:
> Hi Gus,
> OpenMPI was not built with tm support.
> The submission/execution hosts does not have any of the
> PBS environment variable set
> PBS_O_WORKDIR, $PBS_NODEFILE.
> How i can make set it
> regards
> Govind
>

I missed the final part of your message,
about the Torque environment.

This is now more of a Torque question,
and you may want to ask it in the Torque mailing list:

http://www.supercluster.org/mailman/listinfo/torqueusers

The Torque system administration guide may also help:

http://www.clusterresources.com/products/torque/docs/

Anyway, you may not have configured Torque.
However, how did you figure out that the PBS_O_WORKDIR
and PBS_NODEFILE are not set?

Try to put "ls $PBS_O_WORKDIR" and "cat $PBS_NODEFILE" in your
Torque/PBS script.  You can even comment out the mpirun command,
just to test the Torque environment.

I think Torque sets them for each job, PBS_NODEFILE depends on
how many nodes and processors you requested,
and PBS_O_WORKDIR is just your work directory,
from where you launched the job with qsub.

Assuming your Torque is in /var/spool/torque (it may be different
in your system), on the head node, does the file /var/spool/torque/server_priv/nodes list all your nodes,
with the correct number of processors?

It should look somewhat like this
("np" is the total number of 'cores' on each node):

node01 np=2
node02 np=2
...

I hope this helps.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


 On 9 June 2010 18:45, Gus Correa <g...@ldeo.columbia.edu
<mailto:g...@ldeo.columbia.edu>> wrote:

    Hi Govind

    Besides what Ralph said, make sure your OpenMPI was
    built with Torque ("tm") support.

    Suggestion:
    Do:

    ompi_info --all | grep tm

    It should show lines like these:

    MCA ras: tm (MCA v2.0, API v2.0, Component v1.4.2)
    MCA plm: tm (MCA v2.0, API v2.0, Component v1.4.2)
    ...

    ***

    If your OpenMPI doesn't have torque support,
    you may need to add the nodes list to your mpirun command.

    Suggestion:

    /usr/lib64/openmpi/1.4-gcc/bin/mpirun -hostfile $PBS_NODEFILE -np 4
    ./hello

    ***

    Also, assuming your OpenMPI has torque support:

    Did you request 4 nodes from torque?

    If you don't request the nodes and processors,
    torque will give you the default values
    (which may be one processor and one node).

    Suggestion:

    A script like this (adjusted to your site), tcsh style here,
    say, called run_my_pbs_job.tcsh:

    *********

    #! /bin/tcsh
    #PBS -l nodes=4:ppn=1
    #PBS -q default@your.torque.server
    #PBS -N myjob
    cd $PBS_O_WORKDIR
    /usr/lib64/openmpi/1.4-gcc/bin/mpirun -np 4 ./hello

    *********

    Then do:
    qsub run_my_pbs_job.tcsh

    **

    You can get more information about the PBS syntax using "man qsub".

    **

    I hope this helps,
    Gus Correa
    ---------------------------------------------------------------------
    Gustavo Correa
    Lamont-Doherty Earth Observatory - Columbia University
    Palisades, NY, 10964-8000 - USA
    ---------------------------------------------------------------------

    Ralph Castain wrote:


        On Jun 9, 2010, at 10:00 AM, Govind Songara wrote:

            Thanks Ralph after giving full path of hello it runs.
            But it run only on one rank
            Hello World! from process 0 out of 1 on node56.beowulf.cluster


        Just to check things out, I would do:

        mpirun --display-allocation --display-map -np 4 ....

        That should show you the allocation and where OMPI is putting
        the procs.

            there also a error
             >cat my-script.sh.e43
            stty: standard input: Invalid argument


        Not really sure here - must be an error in the script itself.




            On 9 June 2010 16:46, Ralph Castain <r...@open-mpi.org
            <mailto:r...@open-mpi.org> <mailto:r...@open-mpi.org
            <mailto:r...@open-mpi.org>>> wrote:

               You need to include the path to "hello" unless it sits in
            your
               PATH environment!

               On Jun 9, 2010, at 9:37 AM, Govind wrote:


                   #!/bin/sh
                   /usr/lib64/openmpi/1.4-gcc/bin/mpirun hello


                   On 9 June 2010 16:21, David Zhang
                <solarbik...@gmail.com <mailto:solarbik...@gmail.com>
                   <mailto:solarbik...@gmail.com
                <mailto:solarbik...@gmail.com>>> wrote:

                       what does your my-script.sh looks like?

                       On Wed, Jun 9, 2010 at 8:17 AM, Govind
                <govind.r...@gmail.com <mailto:govind.r...@gmail.com>
                       <mailto:govind.r...@gmail.com
                <mailto:govind.r...@gmail.com>>> wrote:

                           Hi,

                           I have installed following openMPI packge on
                worker node
                           from repo
                           openmpi-libs-1.4-4.el5.x86_64
                           openmpi-1.4-4.el5.x86_64
                           mpitests-openmpi-3.0-2.el5.x86_64
                           mpi-selector-1.0.2-1.el5.noarch

                           torque-client-2.3.6-2cri.el5.x86_64
                           torque-2.3.6-2cri.el5.x86_64
                           torque-mom-2.3.6-2cri.el5.x86_64


                           Having some problem on running MPI jobs with
                torque
                           qsub -q long -l nodes=4 my-script.sh
                           42.pbs1 <http://42.pbs1.pp.rhul.ac.uk/>


                           cat my-script.sh.e41
                           stty: standard input: Invalid argument
--------------------------------------------------------------------------
                           mpirun was unable to launch the specified
                application as
                           it could not find an executable:

                           Executable: hello
                           Node: node56.beowulf.cluster

                           while attempting to start process rank 0.
                           ==================================

                           I could run the  binary directly on the node
                without any
                           problem.
                            mpiexec -n 4 hello
                           Hello World! from process 2 out of 4 on
                           node56.beowulf.cluster
                           Hello World! from process 0 out of 4 on
                           node56.beowulf.cluster
                           Hello World! from process 3 out of 4 on
                           node56.beowulf.cluster
                           Hello World! from process 1 out of 4 on
                           node56.beowulf.cluster

                           Could you please advise, if I missing
                anything here.


                           Regards
                           Govind

                           _______________________________________________
                           users mailing list
                           us...@open-mpi.org
                <mailto:us...@open-mpi.org> <mailto:us...@open-mpi.org
                <mailto:us...@open-mpi.org>>

http://www.open-mpi.org/mailman/listinfo.cgi/users




                       --         David Zhang
                       University of California, San Diego

                       _______________________________________________
                       users mailing list
                       us...@open-mpi.org <mailto:us...@open-mpi.org>
                <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

                       http://www.open-mpi.org/mailman/listinfo.cgi/users


                   _______________________________________________
                   users mailing list
                   us...@open-mpi.org <mailto:us...@open-mpi.org>
                <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

                   http://www.open-mpi.org/mailman/listinfo.cgi/users



               _______________________________________________
               users mailing list
               us...@open-mpi.org <mailto:us...@open-mpi.org>
            <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

               http://www.open-mpi.org/mailman/listinfo.cgi/users


            _______________________________________________
            users mailing list
            us...@open-mpi.org <mailto:us...@open-mpi.org>
            <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

            http://www.open-mpi.org/mailman/listinfo.cgi/users



        ------------------------------------------------------------------------

        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        http://www.open-mpi.org/mailman/listinfo.cgi/users


    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users



------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to