Daniel,

Just to be sure, are you running statfs?  The moab scripts gets the node
information from statfs and moab is only showing one node.

Once that is fixed, you may see that torque error anyway.  Did you do
the following as specified in the Readme?: 

3.  There only needs to be one pbs_mom running on the head node(s).
Since there is only one pbs_mom running, Torque needs to be aware of the
number of nodes in your cluster, otherwise job submission will fail if
the user requests more than one node.
To make Torque aware of the number of nodes in your cluster, execute
qmgr and enter something like the following on the qmgr command prompt:

Qmgr: s s resources_available.nodect = 91
Qmgr: s q batch resources_available.nodect=91

On Wed, 2008-09-10 at 17:38 -0400, Daniel Gruner wrote:
> Hi
> 
> I got an evaluation version of Moab (5.2.4), and torque (2.3.3), and
> after following the instructions in the
> sxcpu/moab_torque/README.Moab_Torque file, and running all of
> pbs_server, pbs_mom, and moab, it appears that moab only recognizes
> one node in my cluster.  This test cluster has a master and 2 slaves,
> each with 2 processors.
> 
> Here are my configuration files:
> 
> [EMAIL PROTECTED] torque]# cat server_priv/nodes
> dgk3.chem.utoronto.ca np=4
> 
> [EMAIL PROTECTED] torque]# cat mom_priv/config
> $preexec /opt/moab/tools/xcpu-torque-wrapper.sh
> 
> [EMAIL PROTECTED] moab]# cat moab.cfg
> ################################################################################
> #
> #  Moab Configuration File for moab-5.2.4
> #
> #  Documentation can be found at
> #  http://www.clusterresources.com/products/mwm/docs/moabadmin.shtml
> #
> #  For a complete list of all parameters (including those below) please see:
> #  http://www.clusterresources.com/products/mwm/docs/a.fparameters.shtml
> #
> #  For more information on the initial configuration, please refer to:
> #  http://www.clusterresources.com/products/mwm/docs/2.2initialconfig.shtm
> #
> #  Use 'mdiag -C' to check config file parameters for validity
> #
> ################################################################################
> 
> SCHEDCFG[Moab]        SERVER=dgk3:42559
> ADMINCFG[1]           USERS=root
> TOOLSDIR              /opt/moab/tools
> LOGLEVEL              3
> 
> ################################################################################
> #
> #  Resource Manager configuration
> #
> #  For more information on configuring a Resource Manager, see:
> #  http://www.clusterresources.com/products/mwm/docs/13.2rmconfiguration.shtml
> #
> ################################################################################
> 
> RMCFG[dgk3]      TYPE=TYPE=NATIVE FLAGS=FULLCP
> RMCFG[dgk3]      CLUSTERQUERYURL=exec:///$TOOLSDIR/node.query.xcpu.pl
> RMCFG[dgk3]      WORKLOADQUERYURL=exec:///$TOOLSDIR/job.query.xcpu.pl
> RMCFG[dgk3]      JOBSTARTURL=exec:///$TOOLSDIR/job.start.xcpu.pl
> RMCFG[dgk3]      JOBCANCELURL=exec:///$TOOLSDIR/job.cancel.xcpu.pl
> 
> [EMAIL PROTECTED] moab]# cat tools/config.xcpu.pl
> #################################################################################
> # Configuration file for xcpu tools
> #
> # This was written by ClusterResources.  Modifications were made for XCPU by
> # Hugh Greenberg.
> ################################################################################
> 
> use FindBin qw($Bin);    # The $Bin directory is the directory this file is in
> 
> # Important:  Moab::Tools must be included in the calling script
> # before this config file so that homeDir is properly set.
> our ($homeDir);
> 
> # Set the PATH to include directories for bproc and torque binaries
> $ENV{PATH} = "$ENV{PATH}:/opt/torque/bin:/usr/bin:/usr/local/bin";
> 
> # Set paths as necessary -- these can be short names if PATH is included above
> $xstat    = 'xstat';
> $xrx      = 'xrx';
> $xk       = 'xk';
> $qrun     = 'qrun';
> $qstat    = 'qstat';
> $pbsnodes = 'pbsnodes';
> 
> # Set configured node resources
> $processorsPerNode = 2;        # Number of processors
> $memoryPerNode     = 2048;     # Memory in megabytes
> $swapPerNode       = 2048;     # Swap in megabytes
> 
> # Specify level of log detail
> $logLevel = 1;
> 
> # The default number of processors to run on
> $nodes = 1;
> 
> 
> Here is the output from "showq":
> 
> [EMAIL PROTECTED] ~]$ showq
> 
> active jobs------------------------
> JOBID              USERNAME      STATE PROCS   REMAINING            STARTTIME
> 
> 
> 0 active jobs               0 of 4 processors in use by local jobs (0.00%)
>                             0 of 1 nodes active      (0.00%)
> 
> eligible jobs----------------------
> JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME
> 
> 
> 0 eligible jobs
> 
> blocked jobs-----------------------
> JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME
> 
> 
> 0 blocked jobs
> 
> Total jobs:  0
> 
> 
> When I run the following script:
> 
> #!/bin/bash
> #PBS -l nodes=2
> #XCPU -p
> 
> date
> 
> it eventually finishes, but it runs both processes on the same node
> n0000.  If I specify
> more than 2 nodes (processes, really), the job aborts saying it
> doesn't have enough resouces.  The issue seems to be that moab
> understands that it has only one active node - it appears to simply
> probe the master, since it is the node specified in the
> server_priv/nodes file, and there is a single mom running.
> 
> Any ideas?
> 
> Thanks,
> Daniel
-- 
Hugh Greenberg
Los Alamos National Laboratory, CCS-1
Email: [EMAIL PROTECTED]
Phone: (505) 665-6471

Reply via email to