Hi
We've installed some more nodes today so the
/var/spool/pbs/server_priv/nodes looks like this:
node001.grendel.se np=2 all
node002.grendel.se all
node003.grendel.se all
node004.grendel.se all
node005.grendel.se all
node006.grendel.se all
node007.grendel.se all
node008.grendel.se all
node009.grendel.se all
node010.grendel.se all
node011.grendel.se all
node012.grendel.se all
node013.grendel.se all
node014.grendel.se all
node015.grendel.se all
Could the problem be that it's only on node001 that np= is specified?
But now we submit with qsub -l nodes=15,walltime=10:00 submit_prime
#!/bin/bash
#submit_prime
#PBS -N test_job
#PBS -o test_job_stdout.txt
#PBS -e test_job_stderr.txt
#PBS -q workq
mpiexec -v -boot -machinefile lamhosts -np 16 mpi_prime
#Nothing more to see here
And this works the way we want it. The problem now is that when we
submit more then one job the first one starts and runs but the second
must be forced to start by qrun <id> as root.
I've set qmgr -c 'set queue workq resources_default.nodes = 1' and qmgr
-c 'set server resources_default.nodes = 1' but it doesn't seem to help...
Thanks
/Olof Mattsson
Hi there,
On Tuesday 14 March 2006 17:06, Olof Mattsson wrote:
The master node is a dual AMD 2400+ MP, 2GB RAM, with two nic, eth1 is
connected to our lab network and eth0 is connected to a Summit4. The
first node is exactly the same but with only one nic. The other four
nodes are AMD 1900+, 1GB RAM. Everything is connected through the
Summit. We run Fedora Core 3 and OSCAR 4.2.
...
qsub -l nodes=1:ppn=2+4:ppn=1,walltime=10:00 submit_pi
Runs the job on the four singel CPU nodes
This is what you should be using. Did you check whether
/var/spool/pbs/server_priv/nodes contains the correct CPUs per node?
Did you try the other way round: -l nodes=4:ppn=1+1:ppn=2
Here's what works for me (4 CPUs per node):
[EMAIL PROTECTED] ~]$ qsub -I -l nodes=1:ppn=2+1:ppn=3
qsub: waiting for job 393.slm.cluster to start
qsub: job 393.slm.cluster ready
[EMAIL PROTECTED] ~]$ pbsdsh hostname
bench16.cluster
bench15.cluster
bench16.cluster
bench15.cluster
bench15.cluster
If the job doesn't start, try "qrun $jobid" as root.
This might as well be some scheduler (maui) issue...
Regards,
Erich
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users