Hi I got an evaluation version of Moab (5.2.4), and torque (2.3.3), and after following the instructions in the sxcpu/moab_torque/README.Moab_Torque file, and running all of pbs_server, pbs_mom, and moab, it appears that moab only recognizes one node in my cluster. This test cluster has a master and 2 slaves, each with 2 processors.
Here are my configuration files: [EMAIL PROTECTED] torque]# cat server_priv/nodes dgk3.chem.utoronto.ca np=4 [EMAIL PROTECTED] torque]# cat mom_priv/config $preexec /opt/moab/tools/xcpu-torque-wrapper.sh [EMAIL PROTECTED] moab]# cat moab.cfg ################################################################################ # # Moab Configuration File for moab-5.2.4 # # Documentation can be found at # http://www.clusterresources.com/products/mwm/docs/moabadmin.shtml # # For a complete list of all parameters (including those below) please see: # http://www.clusterresources.com/products/mwm/docs/a.fparameters.shtml # # For more information on the initial configuration, please refer to: # http://www.clusterresources.com/products/mwm/docs/2.2initialconfig.shtm # # Use 'mdiag -C' to check config file parameters for validity # ################################################################################ SCHEDCFG[Moab] SERVER=dgk3:42559 ADMINCFG[1] USERS=root TOOLSDIR /opt/moab/tools LOGLEVEL 3 ################################################################################ # # Resource Manager configuration # # For more information on configuring a Resource Manager, see: # http://www.clusterresources.com/products/mwm/docs/13.2rmconfiguration.shtml # ################################################################################ RMCFG[dgk3] TYPE=TYPE=NATIVE FLAGS=FULLCP RMCFG[dgk3] CLUSTERQUERYURL=exec:///$TOOLSDIR/node.query.xcpu.pl RMCFG[dgk3] WORKLOADQUERYURL=exec:///$TOOLSDIR/job.query.xcpu.pl RMCFG[dgk3] JOBSTARTURL=exec:///$TOOLSDIR/job.start.xcpu.pl RMCFG[dgk3] JOBCANCELURL=exec:///$TOOLSDIR/job.cancel.xcpu.pl [EMAIL PROTECTED] moab]# cat tools/config.xcpu.pl ################################################################################# # Configuration file for xcpu tools # # This was written by ClusterResources. Modifications were made for XCPU by # Hugh Greenberg. ################################################################################ use FindBin qw($Bin); # The $Bin directory is the directory this file is in # Important: Moab::Tools must be included in the calling script # before this config file so that homeDir is properly set. our ($homeDir); # Set the PATH to include directories for bproc and torque binaries $ENV{PATH} = "$ENV{PATH}:/opt/torque/bin:/usr/bin:/usr/local/bin"; # Set paths as necessary -- these can be short names if PATH is included above $xstat = 'xstat'; $xrx = 'xrx'; $xk = 'xk'; $qrun = 'qrun'; $qstat = 'qstat'; $pbsnodes = 'pbsnodes'; # Set configured node resources $processorsPerNode = 2; # Number of processors $memoryPerNode = 2048; # Memory in megabytes $swapPerNode = 2048; # Swap in megabytes # Specify level of log detail $logLevel = 1; # The default number of processors to run on $nodes = 1; Here is the output from "showq": [EMAIL PROTECTED] ~]$ showq active jobs------------------------ JOBID USERNAME STATE PROCS REMAINING STARTTIME 0 active jobs 0 of 4 processors in use by local jobs (0.00%) 0 of 1 nodes active (0.00%) eligible jobs---------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 eligible jobs blocked jobs----------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 blocked jobs Total jobs: 0 When I run the following script: #!/bin/bash #PBS -l nodes=2 #XCPU -p date it eventually finishes, but it runs both processes on the same node n0000. If I specify more than 2 nodes (processes, really), the job aborts saying it doesn't have enough resouces. The issue seems to be that moab understands that it has only one active node - it appears to simply probe the master, since it is the node specified in the server_priv/nodes file, and there is a single mom running. Any ideas? Thanks, Daniel
