Re: [SLUG] Clustering weirdness

2011-11-15 Thread Peter Chubb
 Steven == Steven Tucker tux...@gmail.com writes:

Steven Hi all, got a problem with my cluster using OpenMPI + Torque+
Steven Maui.

Steven I can submit 50 different jobs (single process) and the
Steven batching system will run all 50 in parallel, but I cant get an
Steven MPI job to run on more that 1 node. I assumed it must be my
Steven pbs script, but I have tried just about every config I can
Steven find/think of and still no luck.

I haven't used torque, but if it's anything like NQS, you need a
different batch queue that's configured with the nodes you want to be
able to use.  Also typically there's a different prologue and epilogue
(differently named files) for parallel as opposed to single-node jobs.


We used to have to do something like
   qmgr -c set  queue batch16 resources_max.nodect=16
to allow jobs submitted to the queue `batch16' to use up to 16 nodes,
for instance.   It's been fifteen years since I used NQS so my memory
may be faulty.  And of course, Torque has its own command set
(although I believe it's based on NQS).
--
Dr Peter Chubb  http://www.gelato.unsw.edu.au  peterc AT gelato.unsw.edu.au
http://www.ertos.nicta.com.au   ERTOS within National ICT Australia
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


[SLUG] Clustering weirdness

2011-11-14 Thread Steven Tucker

Hi all,

got a problem with my cluster using OpenMPI + Torque+ Maui.

I can submit 50 different jobs (single process) and the batching system 
will run all 50 in parallel, but I cant get an MPI job to run on more 
that 1 node. I assumed it must be my pbs script, but I have tried just 
about every config I can find/think of and still no luck.


pbsnodes -a produces the following, but all the way up to 16 nodes and 
the second half are quad cores, just showing the first 2 for brevity.


tuxta@WPCluster:~$ pbsnodes -a
WPCluster.workstation.griffith.edu.au
 state = free
 np = 4
 ntype = cluster
 status = 
rectime=1321274238,varattr=,jobs=,state=free,netload=87819550163,gres=,loadave=0.00,ncpus=4,physmem=4047980kb,availmem=11466240kb,totmem=11860068kb,idletime=1574,nusers=1,nsessions=1,sessions=27627,uname=Linux 
WPCluster 2.6.32-34-server #77-Ubuntu SMP Tue Sep 13 20:54:38 UTC 2011 
x86_64,opsys=linux


node02
 state = free
 np = 2
 ntype = cluster
 status = 
rectime=1321274239,varattr=,jobs=,state=free,netload=191116409,gres=,loadave=0.00,ncpus=2,physmem=1021584kb,availmem=8642904kb,totmem=8832648kb,idletime=2258,nusers=0,nsessions=? 
15201,sessions=? 15201,uname=Linux node02 2.6.32-34-server #77-Ubuntu 
SMP Tue Sep 13 20:54:38 UTC 2011 x86_64,opsys=linux



The following file works fine when nodes=1, but when I make nodes=2 it 
never runs, just keeps a state of E rather than R



#!/bin/bash

#PBS -N Hello_Test
#PBS -l nodes=2:ppn=4

cd $PBS_O_WORKDIR

mpiexec -np 8 hello


Not sure what other info is helpful so don't want to put heaps of stuff 
here, if any more info is needed just let me know.


Does anyone have any ideas?

Regards

Tuxta



--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Clustering weirdness

2011-11-14 Thread Amos Shapira
On 14 November 2011 23:53, Steven Tucker tux...@gmail.com wrote:

 Hi all,

 got a problem with my cluster using OpenMPI + Torque+ Maui.


I don't think OpenMPI is so common that you'll find many people with
experience in this forum.

You might have better luck in OpenMPI or Torque or Maui specific forums,
e.g. http://www.open-mpi.org/community/lists/users/

--Amos
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html