Hey olof,
Your script doesnt seem to be utilizing the node file of PBS thingy.
besides its not halting the lam-mpi deamons gracefully.
Try using this script. lemme know if it works for u.
--
Jay
*****************************
#PBS -N Primejob
#PBS -q workq
#PBS -l nodes=16
echo "start it"
echo "HOME=$HOME"
lamboot $PBS_NODEFILE
echo "- LAM is ready"
cd $PBS_O_WORKDIR
mpirun C mpi_prime
lamhalt $PBS_NODEFILE
echo "done"
****************************
From: [EMAIL PROTECTED]
Reply-To: [email protected]
To: [email protected]
Subject: Oscar-users digest, Vol 1 #1576 - 1 msg
Date: Thu, 16 Mar 2006 20:12:48 -0800
Send Oscar-users mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/oscar-users
or, via email, send a message with subject or body 'help' to
[EMAIL PROTECTED]
You can reach the person managing the list at
[EMAIL PROTECTED]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Oscar-users digest..."
Today's Topics:
1. Re: Problem to submit jobs with qsub (Olof Mattsson)
--__--__--
Message: 1
Date: Thu, 16 Mar 2006 22:21:25 +0100
From: Olof Mattsson <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: [email protected]
Subject: Re: [Oscar-users] Problem to submit jobs with qsub
Hi
We've installed some more nodes today so the
/var/spool/pbs/server_priv/nodes looks like this:
node001.grendel.se np=2 all
node002.grendel.se all
node003.grendel.se all
node004.grendel.se all
node005.grendel.se all
node006.grendel.se all
node007.grendel.se all
node008.grendel.se all
node009.grendel.se all
node010.grendel.se all
node011.grendel.se all
node012.grendel.se all
node013.grendel.se all
node014.grendel.se all
node015.grendel.se all
Could the problem be that it's only on node001 that np= is specified?
But now we submit with qsub -l nodes=15,walltime=10:00 submit_prime
#!/bin/bash
#submit_prime
#PBS -N test_job
#PBS -o test_job_stdout.txt
#PBS -e test_job_stderr.txt
#PBS -q workq
mpiexec -v -boot -machinefile lamhosts -np 16 mpi_prime
#Nothing more to see here
And this works the way we want it. The problem now is that when we
submit more then one job the first one starts and runs but the second
must be forced to start by qrun <id> as root.
I've set qmgr -c 'set queue workq resources_default.nodes = 1' and qmgr
-c 'set server resources_default.nodes = 1' but it doesn't seem to help...
Thanks
/Olof Mattsson
> Hi there,
>
> On Tuesday 14 March 2006 17:06, Olof Mattsson wrote:
>
>> The master node is a dual AMD 2400+ MP, 2GB RAM, with two nic, eth1 is
>> connected to our lab network and eth0 is connected to a Summit4. The
>> first node is exactly the same but with only one nic. The other four
>> nodes are AMD 1900+, 1GB RAM. Everything is connected through the
>> Summit. We run Fedora Core 3 and OSCAR 4.2.
>>
>>
> ...
>
>> qsub -l nodes=1:ppn=2+4:ppn=1,walltime=10:00 submit_pi
>> Runs the job on the four singel CPU nodes
>>
>
> This is what you should be using. Did you check whether
> /var/spool/pbs/server_priv/nodes contains the correct CPUs per node?
>
> Did you try the other way round: -l nodes=4:ppn=1+1:ppn=2
>
> Here's what works for me (4 CPUs per node):
>
> [EMAIL PROTECTED] ~]$ qsub -I -l nodes=1:ppn=2+1:ppn=3
> qsub: waiting for job 393.slm.cluster to start
> qsub: job 393.slm.cluster ready
>
> [EMAIL PROTECTED] ~]$ pbsdsh hostname
> bench16.cluster
> bench15.cluster
> bench16.cluster
> bench15.cluster
> bench15.cluster
>
>
> If the job doesn't start, try "qrun $jobid" as root.
>
> This might as well be some scheduler (maui) issue...
>
> Regards,
> Erich
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users