Hi

I got an evaluation version of Moab (5.2.4), and torque (2.3.3), and
after following the instructions in the
sxcpu/moab_torque/README.Moab_Torque file, and running all of
pbs_server, pbs_mom, and moab, it appears that moab only recognizes
one node in my cluster.  This test cluster has a master and 2 slaves,
each with 2 processors.

Here are my configuration files:

[EMAIL PROTECTED] torque]# cat server_priv/nodes
dgk3.chem.utoronto.ca np=4

[EMAIL PROTECTED] torque]# cat mom_priv/config
$preexec /opt/moab/tools/xcpu-torque-wrapper.sh

[EMAIL PROTECTED] moab]# cat moab.cfg
################################################################################
#
#  Moab Configuration File for moab-5.2.4
#
#  Documentation can be found at
#  http://www.clusterresources.com/products/mwm/docs/moabadmin.shtml
#
#  For a complete list of all parameters (including those below) please see:
#  http://www.clusterresources.com/products/mwm/docs/a.fparameters.shtml
#
#  For more information on the initial configuration, please refer to:
#  http://www.clusterresources.com/products/mwm/docs/2.2initialconfig.shtm
#
#  Use 'mdiag -C' to check config file parameters for validity
#
################################################################################

SCHEDCFG[Moab]        SERVER=dgk3:42559
ADMINCFG[1]           USERS=root
TOOLSDIR              /opt/moab/tools
LOGLEVEL              3

################################################################################
#
#  Resource Manager configuration
#
#  For more information on configuring a Resource Manager, see:
#  http://www.clusterresources.com/products/mwm/docs/13.2rmconfiguration.shtml
#
################################################################################

RMCFG[dgk3]      TYPE=TYPE=NATIVE FLAGS=FULLCP
RMCFG[dgk3]      CLUSTERQUERYURL=exec:///$TOOLSDIR/node.query.xcpu.pl
RMCFG[dgk3]      WORKLOADQUERYURL=exec:///$TOOLSDIR/job.query.xcpu.pl
RMCFG[dgk3]      JOBSTARTURL=exec:///$TOOLSDIR/job.start.xcpu.pl
RMCFG[dgk3]      JOBCANCELURL=exec:///$TOOLSDIR/job.cancel.xcpu.pl

[EMAIL PROTECTED] moab]# cat tools/config.xcpu.pl
#################################################################################
# Configuration file for xcpu tools
#
# This was written by ClusterResources.  Modifications were made for XCPU by
# Hugh Greenberg.
################################################################################

use FindBin qw($Bin);    # The $Bin directory is the directory this file is in

# Important:  Moab::Tools must be included in the calling script
# before this config file so that homeDir is properly set.
our ($homeDir);

# Set the PATH to include directories for bproc and torque binaries
$ENV{PATH} = "$ENV{PATH}:/opt/torque/bin:/usr/bin:/usr/local/bin";

# Set paths as necessary -- these can be short names if PATH is included above
$xstat    = 'xstat';
$xrx      = 'xrx';
$xk       = 'xk';
$qrun     = 'qrun';
$qstat    = 'qstat';
$pbsnodes = 'pbsnodes';

# Set configured node resources
$processorsPerNode = 2;        # Number of processors
$memoryPerNode     = 2048;     # Memory in megabytes
$swapPerNode       = 2048;     # Swap in megabytes

# Specify level of log detail
$logLevel = 1;

# The default number of processors to run on
$nodes = 1;


Here is the output from "showq":

[EMAIL PROTECTED] ~]$ showq

active jobs------------------------
JOBID              USERNAME      STATE PROCS   REMAINING            STARTTIME


0 active jobs               0 of 4 processors in use by local jobs (0.00%)
                            0 of 1 nodes active      (0.00%)

eligible jobs----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME


0 eligible jobs

blocked jobs-----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME


0 blocked jobs

Total jobs:  0


When I run the following script:

#!/bin/bash
#PBS -l nodes=2
#XCPU -p

date

it eventually finishes, but it runs both processes on the same node
n0000.  If I specify
more than 2 nodes (processes, really), the job aborts saying it
doesn't have enough resouces.  The issue seems to be that moab
understands that it has only one active node - it appears to simply
probe the master, since it is the node specified in the
server_priv/nodes file, and there is a single mom running.

Any ideas?

Thanks,
Daniel

Reply via email to